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Exercises - Part 1 


True or false: 

1. The Ani29116 is externally TTL compatible, 
but uses ECL ciruitry internally. 

2. The Am29116 is expandable (i.e. two can be hooked together). 

3. The Am29116 is for 8-bit or 16-bit intelligent controllers. 

4. The Am29116 can perform conditional testing 
on its status register. 

5. The barrel shifter rotates 1 to 15 bits 
up or down in one microcycle. 

6. The Am29116 must be used with an Am2904. 

7. The Am29116 can perform immediate operations. 

8. The Am29116 has a choice of four input sources to its data 
MUX's which in turn provide three ALU inputs, R, S, U. 

9. The Am29116 can perform three-address instructions. 

10. Fast clock speed is synonymous with high throughput. 

11. The Am29116 can generate remainders up to 16 bits long 
from CRC polynomials. 
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Exercises - Part 1 (continued) 


12. The Am29116 always has its ALU output at . 

13. The ALU destinations are RAM, ACC, D-Latch. 

14. Single-operand instructions are PASS, COMPLEMENT, 

INCREMENT, and TWO's COMPLEMENT. 

15. D(0E) (D with zero extend) is used 
for two's-complement arithmetic. 

16. "R Dest" calculates one's-complement, 

and "R + 1 --> Dest" calculates two's complement. 

17. The Am29116 can perform NANO, NOR, EXOR in one microcycle. 

18. Shift up can use 0, 1 or the QLINK bit as input to the LSB. 

19. Shift down uses 0, 1 or the QLINK bit 
as the only input choices to the MSB. 

20. Rotate operates in byte or word mode. 

21. Rotate uses the U-input to the ALU. 

22. Load 2 ^ causes a mask (1 in a field of 0's) to be 
generated and can be used for loading RAM, ACC. 

23. Read Y-bus, change a bit, output to Y-bus is possible 
in one microcycle with the Am29116. 

24. If you perform a bit-oriented instruction on the ACC, 
the destination is the ACC or the RAM. 

25. There are 17 possible results when priority encoding a word. 

26. Byte-mode prioritizing ignores bits 8-15. 


ADVANCED MICRO DEVICES ^ 






40 


ED29116 


3-40 


m 


Exercises - Part 1 (continued) 


27. The Am29116 can perform operations on polynomials 
of degree 16 or less. 

28. 95% of CRC calculations use polynomials with 16-bit remainders. 

29. The CRC calculations can be done in a forward or reverse mode, 
(i.e. either transmit bit 0 first or transmit bit 15 first) 

30. CRC remainders can be calculated in byte or word mode. 

31. The status word can be loaded from D, RAM, or ACC. 

32. The Z, C, N and OVR status bits can be loaded 

without affecting LINK or the user flags. 

33. You can set or reset the entire status word. 

34. You can set or reset the ALU flags individually. 

35. On the Am29116, you can load both the status register 

and the ACC in the same microcycle if the RAM is the source. 

36. If the status-register enable is high, the status register 
is "frozen". That is, no operation can alter status. 

37. All conditional testing is performed on the stored 
values in the status register. 
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Exercises - Part 1 (continued) 


Fill in or answer: 


38. How many bits are in the Am29116 status register? 

39. How many conditional tests can be made? 

40. Can you perform a conditional test during another 
instruction? If so, how? 

41. Can byte operations be performed in either the upper byte 
or the lower byte of a register? 

42. Can the Am29116 support a 100 ns microcycle time? 

43. List three possible sources for single operand instructions. 

44. Can you load a byte into the D-latch? 

45. List three possible source pairs for two-operand instructions. 

46. Show the content of this register after a word-mode rotate with n=2. 

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
1100101000110101 

47. Does the Am29116 allow you to rotate R3 
and place the result back into R3? 

48. Does the Am29116 allow you to rotate R3 
and place the result in R7? 

49. If there is no active priority request, what result 
is produced by a prioritize instruction in word mode? 
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Exercises - Part 1 (continued) 


50. If bit 15 is active, what result is produced 
by a prioritize instruction in word mode? 

51. If bit 15 is active, what result is produced 
by a prioritize instruction in byte mode? 

52. Is it true that the Am29116 is an order of magnitude faster 
than the Am2901 which in turn is an order of magnitude faster 
than the AmZ8000 for certain controller-oriented operations? 

53. Can the Am29116 be used to do multiply? 

If so, outline the required code. 

54. Can the Am29116 be used for bit operations? 

55. Can the Am29116 be used for rotate operations? 

56. Does the Am29116 have an ALU? 

57. Can the Am29116 be used to build a CPU? 

58. If a mask bit is zero in a ROTATE-and-MERGE instruction, from 
which operand is the corresponding bit passed to the destination? 

59. If U = 0011 0001 0101 0110 

R = 1010 1010 1010 1010 

mask = 0101 1010 0110 1001 

what bit pattern is produced by a word-mode 

ROTATE-and-MERGE instruction with n=4? 

60. If the highest bit position with a one is position 7 
and the mask is 1010 what is the result 

of a word -mode prioritize instruction? 

61. If the highest bit position with a one is position 7 
and the mask is 1010 what is the result 

of a byte -mode prioritize instruction? 
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Exercises - Part 1 (continued) 


62. What is the result of loading the complement of 2*^? 

What type of instruction allows you to do this? 

63. Suppose you want to do a word-rotate down five bit 
positions. How do you do this on an Am29116? 

64. What is QLINK? 

65. Can you set and reset the ALU status bits individually 
as you can with the Am2904 micro-status register? 

66. Can you set & reset the LINK and FLAG status bits individually? 

67. In byte mode, are all 8 bits of the status register loaded? 

68. Which instructions do not cause the ALU status bits 
to be updated? 

69. When are the upper four status bits 
(LINK, FLA61, FLAG2, FLAG3) changed? 
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Solutions for Exercises - Part 1 (continued) 


1. True 

2. False 

3. True - almost every instruction operates in byte or word mode. 

4. True - using either the instruction lines - I 4 or the 

Tj^ - T 4 lines. 

5. False - the barrel shifter only rotates 1 to 15 bits. 

An effective down rotate is achieved by choosing an appropriate 
number of bits to rotate up such that the result is equivalent 
to the desired down rotation. 

(16-i in word mode; 8 -i in byte mode) 

6 . False - but an Am2904 is useful for emulations 

because of its bit-settable status register. 

7. True - requires two microwords and two microcycles. 

8 . True - ACC, D-Latch, RAM 

and the instruction lines (for immediate data). 

9. False - the Am29116 can be extended only to a two-register-address 

structure (using an external MUX, an additional 5-bit 
microinstruction field and extended timing). 

10. False! - recall the problems that occur on branching with a 

double-pipelined CPU based on Am2910 & Am2901 or Am2903 
and the need for extra NOP instructions. 

11. True 
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Solutions for Exercises - Part 1 (continued) 

12. True OEy is enabled. Otherwise, false. 

13. True with reservations - the D-latch can be used but 
requires some tricky timing. You could generate a race 
condition if the D-latch is also a source for the 
operation. The D-latch is not intended to be a 
destination, and its use as such is not recommended . 

"Normal" destinations are RAM, ACC or NONE. 

14. True. (PASS is another term for MOVE). 

15. False - D(SE) (D sign-extended from bit 7) 

is used for two's complement arithmetic 

16. True 

17. True 

18. True - shift uses the 0, 1 or QLINK. The inputs to the carry MUX 

should not be confused with the inputs to the shift MUX. 

19. False - QC, QNOQOVR and QLINK are also available. 

Do not confuse the use of Q in the Am29116 context 
(designating status-register contents) with the 
Q-shifter or Q-register of Am2901, Am2903 and Am29203. 

20. True 

21. True 

22. True - also goes to internal Y-bus 

23. True - be careful with the timing 

24. False - ACC only. 

BONR instruction has common source/destination field. 

25. True - none and 1 thru 16. 

26. True 
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Solutions for Exercises - Part 1 (continued) 

27. True 

28. True - according to an AMD survey. 

29. True 

30. False - Word mode only. But short polynomials that produce 

an 8-bit remainder can be used. 

31. True - the status can also be loaded from immediate data as well. 

32. True 

33. True 

34. False - if you need this use an Am2904. 

35. False - this is the one source which precludes 

loading both the ACC and the status register. 

36. True 

37. True 


Answers to Fill-in or Answer Section: 


38. Eight - Z, C, H, OVR, LINK, FLAGl, FLAG2, FLAG3 

39. There are 12 condition code test signals - 

You can test all 8 status bits individually. You can force a LOW. 
And you can test 3 combinations: Z+C, N@0VR, (N®0VR)+Z. 

40. Yes - by using the T-bus lines as input. This requires a wider 

microword to control the T-bus. 
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Solutions for Exercises - Part 1 (continued) 


41. In byte mode, instructions alter only the lower byte. 

42. Yes - Very carefully ! 

This requires a register between the sequencer and the control 
store as well as the pipeline register at the output of the 
control store. 125 ns is more easily achieved. 

43. RAM, ACC, D, I, Zero. 

44. No. 

45. RAM - ACC, RAM - I, 0 - RAM, D - ACC, ACC - I, D - I 

46. Rotate is up only - 

1100 1010 0011 0101 n = 2 becomes 
0010 1000 1101 0111 

47. Yes 

48. Yes " with an external MUX and care with timing. 

49. Zero. 

50. One. 

51. Bit 15 does not participate in the byte mode. 

The result will depend on the content of the lower byte, 

52. Yes it is true. Especially when multi-bit rotation, priority 
determination or CRC remainder calculations are needed. 
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Solutions for Exercises - Part 1 (continued) 


53. Yes - but it was not optimized for this application. 

Consider a 16 x 16-bit unsigned multiplication: 

i) Initialize the partial product (PP) and a counter to zero. 

ii) Increment the counter. 

iii) Test the MSB of the multiplier - 

If the MSB is a one then: PP+multiplicand --> PP 

iv) If count equals 16 then END 

else upshift the PP and the multiplier. 

(remember to carry from the LSH to the MSH of the PP) 

V) Goto ii). 


Look at the Am29116 instruction set! We can write a better 
procedure for the Am29116 than the above. 

With the use of the prioritize and the rotate instructions 
you can reduce the number of microcycles. 

(But now the number of microcycles depends on the multiplier!) 

An improved algorithm is as follows: 

i) Initialize PP and counter to zero. 

ii) Prioritize the multiplier & call the result 'prio'. 

If (prio = 0) OR (prio > (16-counter)) 
then rotate PP by (16-counter) and END. 

iii) count+prio --> count 

iv) Rotate PP and multiplier by prio. 

v) PP+multiplicand --> PP 

vi) Goto ii). 
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Solutions for Exercises - Part 1 (continued) 


54. Yes 

55. Yes - The barrel shifter works in the byte or word mode. 

56. Yes - The Am29116 has a 16-bit three -operand ALU. 

57. Yes - but it was not optimized for this application. 

58. The R-operand bit is passed to the destination. 

59. The result is 1011 000 1110 0011. 

60. The result is 9. 

Note: <mask bit i equal to zero> allows participation of 
the source bit i in the priority determination. 

61. The result is one. 

62. This clears the n^^ bit and sets all of the other bits 
in the destination. The instruction is: 

•BORl, LDC2NR' or 'BONR, LDC2NA' or 'BONR, LDC2NY'. 

63. Rotate ^ with n=ll. ... (16-5=11) 

example; 0000 1111 0101 1010 
1101 0000 0111 1010 
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Solutions for Exercises - Part 1 (continued) 


64. QLINK is the linkage bit for shift operations. It is also 
used by CRC instructions to bring in each bit of serial data. 

65. You cannot set/reset the ALU status bits one by one. 

If you need this feature, use the Am2904 as an 
additional external status logic. 

66. Yes 

67. In byte mode, the lower 4 bits (ALU status) on the 
status register are loaded. 

68. NOOP, save-status, test-status _ _ 

or any instruction if either of lEN or SRE are high. 

69. The upper four status bits (LINK, FLAG 1, FLAG 2, 

FLAG 3) are changed during status set/reset; status load 
(word mode only); plus QLINK is updated after each shift 
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CHAPTER 4 


Ani29116 Bit-Mapped Graphics Controllers 
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Ani29116 Bit-Mapped Graphics Controllers 


The Am29116 has proved to be very popular amongst designers of 
graphics controllers. This is a natural result of the fact that the Am29116 
has a very suitable instruction set for the kinds of algorithms that arise in 
graphics applications. 

We will first discuss this subject in general using the article* by 
Chu and Miller as a reference. Then we will examine in detail one of the most 
useful procedures for bit-mapped raster scan graphics, the efficient drawing 
of a straight vector, and show how this is coded for the Am29116. By taking 
both an overall look at the requirements of bit-mapped graphics and by follow¬ 
ing a particular algorithm in detail, we are going to demonstrate why the 
Am29116 has been so successful as a graphics controller. 


* "Microprocessor Architecture Suits Bit-Mapped Graphics" 

by Paul Chu and Warren Miller, Electronic Design, Jan.20,1983 p.l43 
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Display Memory (logical) 


Bit-Map Memory (physical) 



10-bit 

10-bit 

16-bit 

4-bit 

address 

address 

address 

address 

Y 

X 

Y' 

X' 

row# 

column# 

word# 

bit# 










4-26 


ED29116 


4-26 



The mapping process: 


logical row# logical column# 

6 bits 10 bits 6 bits 10 bits 



physical word address 


physical bit address 
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Drawing a Vector on a Two-Dimensional Raster Grid 



We would like to find an efficient procedure to draw a straight 
vector between two points. When given the coordinates of two end points, 
and Pg, located on a raster grid, the procedure should set the bits in screen 
memory associated with these end points and with a set of intermediate points 
closely approximating a straight line. We will describe a procedure that works 
for lines of unit or less slope that selects the closer of the two candidate 
points for each increment of 'x'. That is, for each 'i' in the above figure, 
the procedure will compare the lengths of 's^-' and 't^' and will select the 
point associated with the shorter length. This procedure can be generalized to 
handle lines with slope exceeding unity. 
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x=x^ x=x^+l x=Xy^+2 x=x^+3 x=x^+4 


- of course at x=x^ and x=X 3 , Y=0 


In the above figure we see a successful application of such a pro¬ 
cedure. Since the intermediate points are selected from a finite grid, they do 
not lie precisely on the ideal straight line linking with Pg. A measure of 
the success of the procedure is the value of the Y^'s, the differences between 
the y-coordinates of the intermediate points and the corresponding vertical 
position of the ideal line. 
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An Algorithm for Fast Vector Plotting with an Ani29116 

For a straight line linking two points and Pg(xg,xg): 

_ 

y - y^^i - xq^ 


Let us define dy=(yg-y^) and dx=(xg-x^). For an arbitrary 
point P(Xp,yp), not necessarily on this line: 

Yp=(yp-y{Xp))-dx = -Xp-dy+yp’dx-xgy^+x^yg 

is proportional to the vertical distance of P from the line. We multiplied 
the distance by 'dx' to obtain a measure of vertical distance that can be 
calculated without performing a division. In all further discussion we will 
use this scaled measure for vertical distances. Since we will apply the same 
scale factor, 'dx', to all vertical distances, this tactic will not affect our 
ability to determine which of two points is closer to an ideal line. 

A simple vector-drawing procedure is as follows; 

1. Plot a point at x^,yy^. 

2. Increment x by one. 

3. If X > Xg then quit. 

4. Calculate Y, the vertical distance error, scaled by 'dx', 
for two points: x,y and x,y+l. 

5. If ABS(Y(x,y)) < ABS(Y(x,y+l)) then goto 6 else y=y+l. 

6. Plot a point at x,y. 

7. Goto 2. 

This procedure may be referred to as a digital differential analyser 
by analogy with numerical methods for solving differental equations. This 
procedure has avoided division and multiplication but it still has redundant 
arithmetic. Two additions are required for the two Y's, we may have to 
perform a two's-complement operation on either or both of the Y's to produce 
absolute values and finally a subtraction is required to compare the Y's. We 
can compress this procedure further. 

We have been choosing the next point based on the sign of 
d = ABS( Y-jQ^^gy,)-ABS( Y(jppg^) 

(We will now call 'd' the "discriminant" for our problem). 
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An Algorithm for Fast Vector Plotting with an Am29116 (continued) 


positive. 


But in the situation shown below, 
In this case: 


Yiower is negative and Y^ppg^ is 


^ '^lower'^upper 

and the absolute-value operation is not required. 



Further the i^*^ discriminant can be obtained efficiently from the 
(i-1)^” discriminant. To discover how, consider the Y's involved: 

Yi-i = -Xi.i*dy+ yi.i-dx-Xgy^+x^yB 

Slower = -(Xi.i+l)-dy+ yi-l-dx-Xey^^+Xy^yg = Y^.^-dy 

i 

Yupper. = -(Xi.i+l)*dy+(yi.i+l)-dx-XByA+XAyB = Y^.^-dy+dx 


Therefore: 


*^1 "Yiower “Yypper ” -2Y^_2+2’dy-dx 
i i 


And since Y 0 = 0: 


d^ = 2 -dy-dx 


ADVICED MICRO DiVlCii S 





4-70 


ED29116 


4-70 



An efficient algorithm for plotting a straight vector from given end 
points with minimal arithmetic can now be based on the use of a sequence of 
values of the above discriminant, d. We start with x=x^, y=y;^, d]^=2'dy-dx. 
For each point we check to see if x>xg. If so we terminate the procedure. 
Otherwise we set the bit corresponding to x,y in the screen memory. Then we 
check the sign of d^ and proceed to select the next point and the next dis- 
crimininant, d^-+j_ as follows: 


If is -ve 


If d^- is -s-ve 


x^ = x^-+l x^- = x^+1 

yi = yi yi = yi+i 

(i.e. select lower point) (i.e. select upper point) 


di+i = -2Y^+2*dy-dx 

= -2(Y^'_]^-dy)+2*dy-dx 
= d^+2*dy 

i .e. d^+ 3 ^ = d^+incrl 
where incrl = 2‘dy (always +ve) 


di+i = -2Y^+2*dy-dx 

= -2(Y^-_j^-dy+dx)+2*dy-dx 
= d^+2’dy-2*dx 

i.e. d^+2 = dj+incr2 
where incr2 = 2‘dy-2‘dx (always 

-ve) 


Thus for each point plotted, we require only one addition plus one 


increment for 'x 
is Bresenham's^ 


' and possibly one increment for 
algorithm for a straight line. 


'y'. This improved procedure 
Variations on this algorithm 


can be devised to plot circular and elliptical arcs. 


Reference: Bresenham,J.E. "Algorithm for Computer Control of a Digital Plotter" 

IBM Syst. 0., Vol. 4, No. 1 (1965), pp.25-30 
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Am29116 Microcode for Vector Generation 


Let us show the code that implements Bresenham's algorithm in two 
sections. First let us calculate the various quantities needed before the main 
loop begins. 


We start with: 


We want: 


Reg 

Content 

R00 


R01 

yA 

R02 


R03 

yB 


Reg 

Content 

R00 

xa 


R01 

yA 


R02 

dx=(pixel count)-l 


R03 

incrl = 2 *dy 

— always +ve 

R04 

incr 2 = 2 *dy- 2 ”dx 

— always -ve 

ACC 

d]^ = 2 *dy-dx 



The code required to achieve this result is: 


SOR 

W,MOVE,SORA,R00 

& CONT 

x^ —> ACC 


TORI 

W,SUBS,TORAA,R02 

& CONT 

dx=xg-x^ --> ACC 


SOR 

W,MOVE,SOAR,R02 

& CONT 

dx --> R02 final 

dx 

SOR 

W,MOVE,SOAR,R04 

& CONT 

dx --> R04 


SOR 

W,MOVE,SORA,R01 

& CONT 

—> ACC 


TORI 

W,SUBS,TORAA,R03 

& CONT 

dy=yB-y/\ --> ACC 


SHFTNR 

W,SHA,SHUPZ,NRA 

& CONT 

2*dy —> ACC 


SOR 

W,MOVE,SOAR,R03 

& CONT 

2*dy --> R03 final 

incrl 

TORI 

W,SUBR,TORAA,R02 

& CONT 

2’dy-dx —> ACC final 

dj ** 

TORI 

W,SUBR,TORAR,R04 

& CONT 

2’dy-2*dx --> R04 final 

incr2 


* Note: "SHFTR" is used to multiply by two. 

The alternative of adding RAM to RAM is not available. 

** Note: The particular implementation of the main loop we are about 
to discuss requires that 'd]^‘ be adjusted further. 
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Am29116 Microcode for Vector Generation (continued) 


Program Flow Chart for Main Loop of Bresenham's Vector Algorithm: 
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Am29116 Microcode for Vector Generation (continued) 

The main loop section of the code is as follows: 


TORI W,SUBR,TORAA,R03 
& CONT 

SOR W,INC,SORY,R02 


-dx --> ACC (final dj^) 


W,INC,SORY,R02 ; dx+1 --> Am2910 counter 

& IFNOT CT16 & CT LOW & LDCT JUNK ; Forced pass loads counter 
& JMPI & OEY ; via "JUMP INDIRECT" path. 


& OEY 


; d is -ve 

Instruction #1 of subroutine PLOT ; avoid waste of 1 cycle 
& IFNOT CT16 & CT LOW & CJS PLOT+1 ; unconditional CALL 


TOR W,ADDJORAA,R03 
& RPCT DNEG0 

AAAA Any instruction 


d+incrl --> d 

Last pt plotted? Jmp if not. 
last point has been plotted 


& IFNOT CT16 & CT LOW & CJP BRXIT ; unconditional jmp to exit 


DNEG0: SOR W,INC,SORR,R00 

& IF CT16 & CT N & CJP DNEG 


x+1 --> X 

Test sign of d, branch if -ve 
continue if +ve. 


DPOS: ; d is +ve 

SOR W,INC,SORR,R01 

& IFNOT CT16 & CT LOW & CJS PLOT 

TOR W,ADD,TORAA,R04 

& RPCT DNEG0 


y +1 —> y 
unconditional CALL 

d+incr2 --> d 

Last pt plotted? Jmp if not. 


BRXIT: XXXX ; The vector is now completely drawn. Exit from Bresenham. 

PLOT: ; Hardware-dependant routine to plot point at R00,R01 

YYYY First instruction of PLOT subroutine 
& CONT 

PLOT+1:ZZZZ Second instruction of PLOT subroutine 


LLLL Last instruction of PLOT subroutine 

& IFNOT CT16 & CT LOW & CRTN ; return to caller of PLOT/PLOT+1 
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Am29116 Microcode for Vector Generation (continued) 


Improving the Vector Algorithm: 


The above implementation of Bresenham's algorithm can still be 
improved. As it stands, the code manipulates the logical addresses, 'x' and 
'y', and then plots each point by a procedure ("PLOT") that must convert these 
address coordinates to physical coordinates. It would be better to work with 
the physical addresses directly and hence shorten "PLOT". 

Suppose the physical address consists of a word address,'W, and a 
bit address, 'b'. Suppose further that 'b' is represented by a word, 'B', 
that has a single bit set in the bit position corresponding to 'b'. Then the 
algorithm can be modified as follows: 


1. Replace 'y+1 --> y' by: 

W+(# of pixels per line) —> W ... usually add 2*^ via 'B0R2 A2NR' 
('b' does not need to be altered) 

2. Replace 'x+1 --> x' by: 

Rotate B up one. 

Then if this rotation sets the sign bit, 
conditionally: 'W+1 --> W 


If the code is altered to manipulate 'W and 'B' in this manner, 
then the subroutine, "PLOT", receives the address of the word in the screen 
memory to be modified, along with the bit bit mask with which to set that 
word. This shortens "PLOT" by at least 4 cycles. The implementation of this 
improvement is left as an exercise for the reader. 
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CHAPTER 5 


Intelligent Controllers Based on Am29116 
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A Typical Peripheral Controller 
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BLOCK DIAGRAM OF A TYPICAL PERIPHERAL CONTROLLER 
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Peripheral Controllers 


The functions of a peripheral controller may include: 


s Parallel transmission of data and address information 
between the host computer and the device being controlled. 

i Detection and execution of commands from the host. 

» Provision of status information to the host 

indicating the state of the controller and the controlled device. 

s Serial transmission of data to and from the controlled device. 

g Generation and testing of status, command and timing bits 
that coordinate controller/controlled-device interaction. 

® Execution of calculations or algorithms related 
to the control of the peripheral or to processing data from it. 
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Intelligent Controller 
Low Speed Version 
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Intelligent Controller - Low Speed 
Devices Needed for the Minimum Configuration : 

To control the Am29116: 

@ Am2910 sequencer 
@ Microprogram Memory 

s Pipeline Register (could be incorporated in a registered 

PROM that contains the microprogram) 


To control the sequencer: 

§ 1 1/2 octal D-type flip-flops (Am2920‘s) forming a 12-bit register 
acting as a latch for the direct input to the Am2910 sequencer 


To interface with the host: 

§ 2 8-bit bidirectional I/O ports (Am2950's) connected to the data bus 
and providing the data path to and from the host 

i 2 DMA address generators (Am2940's) 
to drive the host address bus 

§ 1 bidirectional I/O port (Am2950) 

interfacing with the host control bus 


To interface with the peripheral device: 

® 2 bidirectional I/O ports (Am2950's) conveying status and 
command signals to and from the peripheral device 

e 1 serial-to-parallei converter for the serial data stream 
to and from the peripheral device 

9 1 scratchpad or buffer RAM 

And, of course the Am29116 
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Low Speed Disk Controller 



control for peripheral devices 


= output enable 
*CE = clock enable 

block diagram of a minimal configuration of a disk controller 
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Intelligent Controller - Low Speed 
Brief Description of Some of the Elements 
Am2910 Sequencer 

s an address sequencer intended for controlling the sequence 
of the execution of microinstructions stored in microprogram memory 

8 capabilities: 

- fixed width of 12 address bits 

- simple sequential access 

- conditional branching to any microinstruction within 
its 12-bit 4096-word address range 

- incorporates a 12-bit counter for loop control 

- provides a 5-level stack for microsubroutine linkage 
i can accept an address from one of 4 sources: 

- its microprogram address counter 

- a direct input from an external source 

- its internal register 

- its stack 

@ executes 16 instructions as specified by 4 instruction inputs 

@ has 5 control pins to control branching, loading 
of its register, incrementing of its program counter 
and enabling its output bus drivers 
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Intelligent Controller - Low Speed (continued) 


Brief Description of Some of the Elements (continued) 


Ani2920 Octal D-Type Flip-Flop 

« provides eight edge-triggered D-type flip-flops with 

- a buffered common clock 

- a buffered common clock enable 

- a buffered common asynchronous clear input 

- three-state output control 

i the clear input,CLR, resets all eight flip-flops 
independent of all other inputs 

i With the three-state output-enable LOW, all eight outputs 
appear as normal TTL outputs. Otherwise the outputs are 
in a high impedance state. 

§ The clock-enable input, E^ is used to selectively load data 
into the register. When_E is HIGH the register will retain 
its current data. When E is LOW, new data is entered 
on the LOW-to-HIGH transition of the clock input. 
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Intelligent Controller - Low Speed (continued) 

Brief Description of Some of the Elements (continued) 

Am2950 Eight-Bit Bidirectional I/O Port 
§ designed for use as a parallel data I/O port 

i provides 2 back-to-back registers to store data moving in both directions 
between 2 bidirectional three-state busses 

f provides a handshake-flag flip-flop for each data direction 
to allow coordination of demand-response data transfer: 

- Each flag flip-flop is set automatically when a register is loaded. 

- Each flag flip-flop has an edge-sensitive clear input. 

f provides for each register: 

- a clock input 

- a clock enable 

- a three-state output enable 
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Intelligent Controller - Low Speed (continued) 

Brief Description of Some of the Elements (continued) 

The DMA Address Generator - Am2940 

s High speed, cascadable, eight-bit wide 
Direct Memory Access address generator slice 

9 Generates sequential memory addresses for use in the 
sequential transfer of data to or from a memory 

3 Equipped with 

- address counter (increment/decrement) 

- address register (saves the initial address) 

- word counter (increment/decrement) 

- word counter register (terminal count) 

- three-state address ouput buffers 

g 4 control modes 

9 8 different instructions (3 instruction pins) 

- write/read control register (control mode) 

- read word/address counter 

- reinitialize counters 

- load address/word count 

- enable counters 

@ 3 control pins 
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Intelligent Controller - Low Speed (continued) 

Brief Description of Some of the Elements (continued) 


The Scratch Pad Memory 


§ Not always necessary for a peripheral controller 

§ Improves system performance by allowing the host CPU 
and the peripheral device to respond at different rates. 

@ May also be useful in improving the execution speed of 
a controller algorithm by providing quick-access storage 
for variables or look-up tables. This traffic can thus be 
kept off of the main bus. 
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Intelligent Controller - Low Speed (continued) 


Microword 


s 16 instruction bits for the Am29116 that are 
shared with the 12 data bits to the Am2910 that provide for 
loading the counter and providing branch addresses 

9 1 bit to determine the destination of the above lines 
(to either the Am29116 or the Am2910). Use the lEN input 
of the Am29116 to disable it on those microcycles when the data 
on the instruction lines is intended for the Am2910. 

e 4 instruction bits for the Am2910 

e 3 instruction bits for the Am2940 

• Using this technique of sharing microinstruction fields, 
a microword of 28 bits is possible. Of course, there is 

a speed penalty caused by the need to halt the Am29116 

when data is passed to the sequencer. 
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Peripheral Controller with Maximal Performance 



liE-and control 
for peripheral devices 


block diagram of a maximal configuration of a disk controller 
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Intelligent Controller - High Speed (continued) 


Additional Elements 

s 2 multiplexers: 

- to define different Am29116 source and destination registers 

- to select the branch condition for the Ani2910 

§ Am2925 clock generator and microcycle length controller. 

For extended timing required by Am29116 two-address operation. 

9 Am2904 status and shift control unit: 

- to add more flexibility to the Am29116 status tests 

- use only the Am2904 micro status register and its 
condition code instructions 
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Intelligent Controller - High Speed (continued) 

Brief Description of the Additional Elements 

Am2925 Clock Generator and Microcycle Length Controller 

§ General purpose crystal-controlled clock generator/driver 
§ Has a microprogrammable clock cycle length: 

- provides significant speed-up over fixed clock cycle 

- meets a variety of system speed requirements 

® Generates four different simultaneous clock output waveforms 
@ One of eight cycle lengths can be selected by the microprogram 

® System control functions include: 

- run 

- halt 

- single-step 

- initialize 

- ready/wait 

- inputs can determine: • where a halt occurs 

' the end point timing of wait cycles 

• Up to 12 pins can be controlled by the microword. 
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Intelligent Controller - High Speed (continued) 


Brief Description of the Additional Elements (continued) 


Aid2904 Status and Shift Control Unit 


§ Designed to perform all the miscellaneous functions 
which are usually performed in MSI around an ALU 


® It contains three nearly independent blocks of logic: 

- multiplexer to generate the carry-in 

- 4 three-state multiplexers for shift linkage 

- 2 status registers for storing carry, overflow, zero 
and negative status flags. 

These status registers control a condition-test output 
via a condition code multiplexer. 

A wide selection of condition-code test logic is provided. 


In our application only the status registers 
and condition-test logic is used. 
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Intelligent Controller - High Speed (continued) 

Brief Description of the Additional Elements (continued) 
Am2904 Condition Code Output 


^3 - h 

Condition 

Code Output 

0 0 0 0 

(N®0VR)+Z 


0 0 0 1 

(N®0VR)-Z 

★ 

0 0 10 

N@0VR 


0 0 11 

NiOVR 

•k 

0 10 0 

Z 


0 10 1 

Z 

•k 

0 110 

OVR 


0 111 

OVR 

k 

10 0 0 

C+Z 

k 

10 0 1 

c-z 

k 

10 10 

C 


10 11 

C 

k 

110 0 

C+Z 


110 1 

C+Z 

k 

1 1 10 

N 


1111 

N 

k 


* 9 Condition codes not available at the Am29116 CT-output, 
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Intelligent Controller - High Speed (continued) 


Reasons for the Better Performance of the Second Solution 

t Instruction inputs of the Am29116 and the inputs of 

the Am2910 are driven from separate microcode bits. 

- allows simultaneous instruction execution in the Am29116 and 
direct-address branching in the Am2910 

- reguires an additional 12 bits in the microinstruction 

i Multiplexer at the CC-input of the Am2910 (controlled by 2 bits) 

- allows testing of conditions without loading the signals 
into the Am29116 

* T][_4 inputs of the Am29116 driven by 4 additional microword bits 

- allows simultaneous testing and execution of an Am29116 instruction 

a Use of the Am2904 (controlled by 4 bits) 

- improved flexibility in status testing 

@ Multiplexer at the inputs of the Am29116 (needs 6 bits) 

- allows different source and destination addresses in RAM 
in the same microcycle 

© The Am2925 clock generator/driver 

- able to dynamically alter the length of the microcycle 
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Intelligent Controller 
Comparison with an Am2901 - Based Solution 
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Alternate Means of Obtaining Am29116 Features 


Am29116 


Am2901-Based Solution 


® 16-bit data path 


Use 4 Am2901 microprocessor slices and 
1 Am2902 carry-look-ahead generator. 


® 32 registers s Am2901 has 16 registers but is not readily 

expandable. For additional registers use 
Am29(Z)3 or Am29203 processor with 4 Am29705 
16 X 4-bit register file extenders. 


® 16-bit barrel shifter 


Use n cycles to shift data by n bits. 









Alternate Means of Obtaining Am29116 Features (continued) 


Am29116 

® Status register and 

condition code generator/MUX 

9 Byte/Word Mode 


Am2901-Based Solution 


@ Use 2 Am2904 status and shift control units 

- One Ani2904 for the ALU flags (C,N,0VR,Z) 
and condition code generation 

One Am2904 for the 3 user-definable flags 
and the linkage flag 


® Use two output enable bits (OEy). 

Then in byte operations force Wy high 
for slices 3 and 4. 

® Multiplex status lines at slices 2 and 4 


Note: This will permit external byte-mode functions. 
Operations on internal registers will still 
alter all 16-bits. You should choose Am29203 
rather than Am2901 if Byte/Word operations are 
important. When used with a status-flag MUX, 
Am29203 fully supports Byte/Word operations. 
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Alternate Means of Obtaining Am29116 Features (continued) 

Am291l6 Am29(/)1-Based Solution 

® Built-in 16-bit priority encoder ® Use two 8-bit Am25LS2513 priority encoders 

® Masking capability of the ALU ® Use three microcycles: 

- mask the first operand 

- mask the second operand 

- operate on the masked operands 

® Emulate with several microinstructions 

® Can be emulated by a lengthy microcode 
sequence. Will execute too slowly 
for many applications. 


® Bit oriented instructions 
® CRC instructions 
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Intelligent Controller - Using Am2901 

Brief Description of the Additional Devices 
Am2901 4-bit Microprocessor Slice 
@ High-speed cascadable element for use in 

- CPU's 

- peripheral controllers 

- programmable microprocessors 

- numerous other applications 
« Consists of: 

- a 16-word by 4-bit two-port RAM 

- a high-speed 8-function ALU 

plus shifting, decoding and multiplexing sections 

§ Cascadable with either: 

- simple ripple carry propagation 

- the Am2902 look-ahead carry generator 

® Produces 4 status flags (N, OVR, C, Z) 

® accepts 9-bit microinstructions: 

- 3 bits select the ALU operand source 

- 3 bits select the ALU function 

- 3 bits select the ALU destination 

® accepts 8 bits to select the two RAM addresses 
® accepts 2 control bits (OE and carry-in) 
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Intelligent Controller - Using Am2901 

Brief Description of the Additional Devices 

Am2902 High-Speed Look-Ahead Carry Generator 


e 

§ 


Provides anticipated carries across a group of four binary ALU's 

Accepts up to 4 pairs of carry P^ropagate and carry Generate 
signals from an ALU and one carry input. 

With ALU operands A and B and an addition operation: 

P = P3-P2'PrP0 G = G3+P3*G2+P3-P2 ‘Gi+P3-P2’Pi'G0 

P0 = G0 = A0’B0 

Pi = Ai+Bi Gi = Ai"Bi 

^2 " ^2 " ^2‘^2 

P 3 ^ *^3 " ^3*^3 


i Also provides carry-propagate and carry-generate signals 
to use for further levels of look-ahead. 


@ Logic results provided at the outputs are: 
Gn+x "" G0+P0*Cp 
Cn+y = Gi+Pi'G 0 +Pi*P 0 *Cn 

Gn+z = G2+P2’Gi+P2’Pl‘G0+P2'Pi‘P0‘Cn 
G = G3+P3'G2+P3-P2*Gi+P3*P2*Pi'G0 
P = P3‘P2*Pl“P0 
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Intelligent Controller - Using Am2901 

Brief Description of the Additional Devices 


Am25LS2513 Three-State Priority Encoder 
@ Encodes eight Tines to three-line binary 
@ Three-state outputs 

- controlled by three active LOW and two active HIGH inputs 
§ Cascadable 

- provides an input enable and an output enable 

to permit cascading without additional circuitry 
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Intelligent Controller - Using Am2901 


Thus we have seen that 
with the same features 
level of integration. 


it is rather hard to build a controller 
as the AM29116 from parts with a lower 


On the next page we see a simple disk controller 
based on a pair of Am2901A's. 
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The Microword for the 2901 - B ased Solution 
- 48 bits wide 

. 19 bits Am2901 instruction (M47 - M37, M35 - M28) 

- 3 bits function select (FCN) 

- 3 bits source (SRC) 

- 3 bits destination (DST) 

- 4 bits A port register address 

- 4 bits B port register address 

- 1 bit carry in (Cjj) 

- 1 bit output enable (OE) 

8 bits M-bus control (M27 - M20) 

- 4 bits bus source (BUS SRC) 

- 4 bits bus destination (BUS DST) 

. 11 bits sequencer (M36, M17 - M8) 

- 4 bits Ani2911 instruction (SEQ INSTR) 

- 4 bits condition code select (TEST) 

- 1 bit polarity of CC (POL) 

- 2 bits to determine the page of microprogram memory 

. 8 bits data (M7 - MO) 

. 2 bits additional control (M19, M18) 

- 1 bit increments MAR (INC MA) 

- 1 bit initiate data transfer (TRAN) 
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Intelligent Controller 
Very High Speed Solution 
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Intelligent Controller - Very High Speed 


Some General Considerations 

9 Interface-signal names, polarities and functions 
used here are similar to those used in the current ANSI 
standard for hard-disk drives. 


§ The methods and functions discussed here can be used for most 
current hard- or flexible-disk drives. 

f With minimal external logic, this controller uses an Am29116 
and an Am9520 burst error processor to perform all of the 
functions needed to: 

- write and read at 30 Mbits per second 

- format a disk 
Including: 

- searching a track for a specific header and sector 

- managing data flow through a high-speed buffer memory 

- generating modified Fire-code check bits while writing 

- detecting and correcting single and burst errors on reading 

- generating and checking of CRC's in sector headers 
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Intelligent Controller - Very High Speed (continued) 

Ani9520 Burst Error Processor 

Distinctive Characteristics: 

§ Provides for detection and correction of burst errors: 

- detects errors in serial-data blocks up to 585K bits long 

- allows correction of error bursts of up to 12 bits long 

s Effective data rates up to 20 Mbits per second: 

- fast enough for high-performance hard disk systems 

8 Four selectable industry-standard polynomials: 

- popular IBM 56- and 48-bit polynomials 

- also 35-bit and 32-bit polynomials 

8 Three correction algorithms provide flexibility: 

- full-period clock-around method compatible with current practice 

- Chinese Remainder Theorem method reduces correction time 
by orders of magnitude 

- reciprocal polynomial method for correction 
with the 48-bit IBM code 

• Designed for use in disk controllers and communication systems 
based on fixed-instruction-set or microprogrammed processors. 
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Intelligent Controller - Very High Speed (continued) 


Ani9520 Burst Error Processor (continued) 


Functional Description: 


@ Register Array 

- consists of 56 flip-flops used for 
. check-bit computation during write operations 
. syndrome computation during read operations 
. error pattern extraction during error correction operations 


s Polynomial-Divide Matrix: 

- establishes interconnections and feedback for a group 
of shift registers such that an entire byte of data 

is divided in a parallel operation by the selected polynomial 

- the matrix is controlled by 2 Polynomial Select 
and 2 Function Select inputs 

- the data is presented to the matrix a byte at a time 
on 8 data lines 


9 When correction operations are complete, the error 
pattern is available on 12 outputs: 

- eight bits on the Q 0 -Q 7 outputs 

- four bits on the LP 0 -LP 3 outputs 
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Intelliqent Controller - Very High Speed (continued) 


Am9520 Burst Error Processor (continued) 


i Write: 

- While the data is being written, the Ani9520 is in the 
Compute-Check-Bits mode, calculating the polynomial remainder 
without affecting the flow of data to the disk 


- After the last data byte, the Am9520 is switched into the 
Write-Check-Bits mode, outputting the 4, 5, 6 or 7 check bytes. 

- These check bytes constitute additional information 
to be appended to the data stream to allow detection 
and correction of errors on reading. 


s Read: 

- Two modes are available when reading: Normal and High Speed 

- The two modes use different correction algorithms. 

- After the last information byte has been read, the state of 
the ER output signal indicates whether an error has occured. 
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Intelligent Controller - Very High Speed (continued) 


Am9520 Burst Error Processor (continued) 


© Correction: 

- After the read operation, the syndrome in the register array 
contains information specifying: 

' the location of the error 

• the bit pattern of the error 

© Normal Correction Mode: 

- The error location is found by counting the number of clock 
pulses required to make the EP output go HIGH. 

- The error pattern available on LP 0 -LP 3 and Q 0 -Qy can 

be Exclusive ORed with the data to effect the correction. 


© High-Speed Correction Mode: 

- The error location is found by counting the number of clock 
pulses required to generate an indicator for each of the 

2 or 4 factors of the polynomial. 

- The error pattern available on LP 0 -LP 3 and Q 0 -Q 7 can 

be Exclusive ORed with the data to effect the correction. 

- While there are more steps to the high-speed-correction procedure 
than are required by the "Normal" procedure, the correction is 
accomplished far faster. 

- The high-speed correction method is not available for the 48-bit 
polynomial. 

- For the 56-, 35- and 32-bit polynomials, 

the high-speed method should be preferred over the "normal method". 
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Intelligent Controller - Very High Speed (continued) 
Am9520 Burst Error Processor (continued) 


Computing Check Bits: 

f The Polynomial Divide Matrix and the Register Array implement 
the familiar serial form of feedback shift register arrangement 
in an 8-bit parallel form. 

9 The $i and inputs select one of 4 polynomials; 


Polynomial 

Degree & 
Number of 
Check Bits 

Period 

(Bits) 

Correctable 
Burst Error 
Length (Bits) 

(x^^+1)’(x^^+x^+x^+x+l)• 
(xl2+xll+xl0+ +X+1)’ 
^xll+x^+x^+x^+xS+x+i) 

56 

585,442 

11 

(x^^+1)*(x^^+x^+l) 

32 

42,987 

11 

(x^^+1)*(x^^+x^ ^+x^+x^+x^+x+l) 

35 

94,185 

12 

(x^^+1)‘(x^^+x^^+x^+x^+l) 

48 

13*(235-1) 

=4.466...xl0^^ 

7 


When the last data byte has been read or written, the Register Array 
contains the check bits. 


Remember to select the same polynomial for reading as was used for writing. 
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Intelligent Controller - Very High Speed (continued) 
Ani9520 Burst Error Processor (continued) 


Computing Check Bits (continued) 

s Sequence of events to compute the check bits 

i) The clock input,CP, should be in the quiescent HIGH state. 

ii) Initialize by activating the master reset input, MR, LOW 
and return it to HIGH. 

iii) Specify the desired polynomial via S 0 , S^. 

Apply zeroes to C 2 -C 0 to select Compute-Check-Bits mode. 

iv) Establish a byte of data on 00 - 0 ; inputs. 

v) Make the clock input go LOW then HIGH. 

vi) Repeat from step iv) until all data bytes are entered. 
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Intelligent Controller - Very High Speed (continued) 
Am9520 Burst Error Processor (continued) 


Write Check Bits: 

® When the Write-Check-Bits mode is established, the feedback 
paths of the register array are disabled and the check bits 
may be shifted out. 

® Checkbits are available on the Q 0 -Q 7 outputs one byte at a time. 

® Sequence of events to obtain the check bits: 

i) The clock input, CP, should be in the quiescent HIGH state, 

ii) Select the polynomial via the S 0 ,Si inputs, 

iii) Force C 2 ,C]^,C 0 to LOW LOW HIGH (Write Check Bits). 

iv) After a propagation delay the Q 0 -Q 7 outputs will 
contain the, first check byte. 

v) Make the clock input go LOW then HIGH. The next check byte 
will be available on the Q 0 -Q 7 outputs. 

vi) Repeat from step v) until all check bytes are read out. 
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Intelligent Controller - Very High Speed (continued) 


Am9520 Burst Error Processor (continued) 


Read - General: 


t The input stream (data and check bytes) is divided by the 
selected polynomial to obtain the syndrome. 


« A non-zero syndrome indicates an error has been detected. 
If the syndrome is not zero the ER output will be HIGH. 


s Two methods for error correction are available: 

- full-period clock-around ("Normal") 

- Chinese Remainder Theorem ("High Speed Method") 


e There is a different read procedure for each of these methods: 

- Read Normal: produces one syndrome 

- Read High Speed: produces as many syndromes as the 
number of factors in the polynomial. 

The input stream is simultaneously divided by all of the 
factors of the polynomial. The ER output indicates 
whether or not all syndromes are zero. 


« Read High Speed is not available for the 48-bit polynomial 
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Intelligent Controller - Very High Speed (continued) 
Ani9520 Burst Error Processor (continued) 


Read Normal: 

s Sequence of events for Read Normal: 

i) The clock input should be in the quiescent HIGH state. 

ii) Initialize by pulling the master reset, MR, LOW 
and then returning it to HIGH. 

iii) Select the required polynomial via S 0 ,S]^ inputs. 

iv) Apply LHL to C 2 ,C],,C 0 to select Read Normal. 

v) Present a byte of information as read from the disk 
to the D^-Dy inputs. 

vi) Make the clock go LOW then HIGH. 

vii) Repeat from step v) until the last check byte read 
from the disk is processed. 

viii) Test the ER output: 

- ER HIGH: an error has been detected. 


- ER LOW: 


no error has been detected. 
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Intelligent Controller - Very High Speed (continued) 
Am9520 Burst Error Processor (continued) 


High Speed Read: 

s Sequence of events for Read High Speed: 

i) The clock input, CP, should be in the quiescent HIGH state. 

ii) Specify the polynomial via the S 0 ,Sj^ inputs. 

iii) Apply LHH to 02 , 0^,00 to select Read High Speed. 

iv) Initialize by pulling the master reset, MR, LOW 
and then returning it to HIGH. 

v) Present a byte as read from the disk to the D 0 -D 7 inputs. 

vi) Make the clock go LOW then HIGH. 

vii) Repeat from step v) until the last check byte has been read, 

viii) Test the ER output: 

- ER HIGH: an error has been detected. 

- ER LOW: no error has been detected. 
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Intelligent Controller - Very High Speed (continued) 


Am9520 Burst Error Processor (continued) 


Function Select Codes 


^2 

Cl 

C0 

Function 

L 

L 

L 

Compute check bits 

L 

L 

H 

Write check bits 

L 

H 

L 

Read normal 

L 

H 

H 

Read high speed 

H 

L 

L 

Load 

H 

L 

H 

Reserved 

H 

H 

L 

Correct normal 

H 

H 

H 

Correct high speed 
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Intelligent Controller - Very High Speed (continued) 


Ani9520 Burst Error Processor (continued) 

I 

Correct Normal: 

§ The Am9520 manipulates the syndrome to yield: 

- error pattern (at Q^-Q; and LP 0 -LP 3 outputs) 

- error location (needs further external computation) 

s Syndrome is repeatedly divided by the polynomial until 
the error pattern is located: 

- done by repeatedly clocking without regard to the D 0 -D 7 inputs 
until EP goes HIGH 

§ If the AE output goes HIGH while the EP output remains LOW, 
an alignment exception has been detected. 

9 Count clock cycles (#C) until EP goes HIGH. 

® If #C > (period of polynomial) then the error is uncorrectable. 
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Intelligent Controller - Very High Speed (continued) 


Am9520 Burst Error Processor (continued) 

Correct Normal: (continued) 

i Use two external counters (Rl, R2) 

- Rl counts the number of cycles until AE goes HIGH 

- R2 counts the number of cycles from AE going HIGH 
to EP going HIGH. (= 0 if no alignment exception) 

® If R1+R2 > (period of polynomial) then the error is uncorrectable. 


® (N*K - 8*R1 - R2) gives the location of the first bit in the 
error burst counting from the last check bit of the record 
for the 56-bit and 32-bit polynomials. 

(N'K - 8*R1 - R2+5) gives the location of the first bit in the 
error burst counting from the last check bit of the record 
for the 35-bit polynomial. 

where: 

K is the smallest +ve integer that makes this expression +ve. 
and N is the period of the polynomial. 


Note: The 48-bit polynomial uses another correction algorithm. 
See specification sheet for details. 
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Intelligent Controller - Very High Speed (continued) 


Am9520 Burst Error Processor (continued) 


Burst Error Processor (continued) 
Correct High Speed: 


@ This mode allows you to determine the error pattern 
in far fewer clock cycles than does the "Normal" mode. 

- A polynomial with m factors with periods P]^,P 2 ,..., P^, 
will correct in no more than the following number of 
of clock cycles: 

. In Normal mode: ‘'^m product of the P's 

. In High Speed mode: P]^+P 2 + ... +P[j, i.e. the sum of the P's 


i Number of syndromes equals the number of factors of the polynomial. 

s Refer to the table a few pages back which shows the factorization 
of the polynomials. 

As we have written the factors, the first factor has a special 
signifigance. The degree of this first factor determines the 
maximum length of an error burst that is still correctable. 
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Intelligent Controller - Very High Speed (continued) 


Am9520 Burst Error Processor (continued) 


@ The error location is given by: 
L = N-K - (AlMl + A2M2 + ... + 


Where N is the natural period of the polynomial. 

K is the smallest integer that makes L positive, 
are the numbers of clock cycles required to match 
the error pattern of each factor, 
and A^ are the Chinese Remainder Theorem coefficients: 


Polynomials 

A1 

A2 

A3 

A4 

56-bit 

452,387 

578,864 

2,521,904 

2,647,216 

32-bit 

311,144 

32,760 

— 

— 

35-bit 

32,760 

720,728 

-- 

— 
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Intelligent Controller - Very High Speed (continued) 


Am9520 Burst Error Processor (continued) 

Correct High Speed (continued) 

i To determine and the error pattern use: 

- P 0 -P 3 inputs to select the register section to be clocked 

- PM 2 -PM 4 , the "pattern match" outputs: 

When PM^ is HIGH then the syndrome^ shows a match 
with the error pattern. 

@ For Ml: use the polynomial-shift controls to select 
register-section #1 (P 0 sPi 5 P 2 >P 3 ~ HLLL). 

Use the same procedure as in the correct normal mode. 

- If R1+R2 > (period of factor #1) then error is uncorrectable. 

- If error is correctable then Ml = 8 Rl + R2 

i The M^ will be determined after Ml but slightly differently: 

- Select the i^*^ register section via P 3 -P 0 

- While PM^ is LOW clock the 9520 and increment a counter 

- If the count exceeds the period of this factor the error 
is not correctable. 

§ When EP and all PM^'s associated with this polynomial are HIGH, 
then the error pattern and error location are determined. 
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Intelligent Controller - Very High Speed (continued) 


Am9520 Burst Error Processor (continued) 


Am9520 Polynomial Periods 


Polynomial 

Period 
Factor 1 

Period 
Factor 2 

Period 
Factor 3 

Period 
Factor 4 

Composite 
Period (N) 

56-bit 

22 

13 

89 

23 

585,442 

32-bit 

21 

2047 

— 

— 

42,987 

35-bit 

23 

4095 

— 

— 

94,185 


Note: The 48-bit polynomial requires the use of 

a different correction procedure and is not shown here. 
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Intelligent Controller - Very High Speed (continued) 

System Organisation : 

@ Interface to the disk drives: 

- bit-serial data paths for read and write 

- byte-parallel paths for 
. commands 

. disk addresses 
. disk status 

€ Am2910 sequencer and Am27S35 registered PROM microprogram 
memory drives 76-bit control bus: 

a Data flow is: 

- asynchronous & serial at 30 Mbits/sec from drive to a FIFO array 

- 16-bit clocked & parallel from the FIFO array to the buffer memory 
at 20 Mbits per second via the internal data bus. 

® The BEP is located on the internal data bus. 

i Disk read: 

- Data is read into the FIFO array at 30 Mbit per second. 

- Concurrently, the data is transferred from the buffer memory 
to the BEP at a rate of 15 MHz. 

@ Disk write: 

- The BEP pre-calculates the check bits before the write. There is 
not enough time to overlap BEP/buffer and buffer/FIFO transfers. 
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Intelligent Controller - Very High Speed (continued) 


Microinstruction Format 


i 80-bit wide microinstruction 

- This is not a minimum width, but it demonstrates 
microcoding in a straightforward manner. 


§ Sample microcode for uncompressed sector read/write 
operations is available including: 

- header and sector searching 

- error checking of the header via Am29116 CRC instructions 

- error checking and correction of the data segments 

via Am9520 and its 56-bit modified Fire code polynomial 


@ The sector input/output microroutine (SECTIO) performs input 
or output of a single 256-byte sector. 
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Microinstruction for the Very High Speed Controller 




Microinstruction Field 


Bits 

Width 

Mnemonic 

Comment 

79-64 

16 

Il5-l0 

Am29116 Instruction 

63-60 

4 

T 4 -T 1 

Am29116 Conditional Test Select 

59 

1 

SRE 

Am29116 Status Register Enable 

58 

1 

M'y 

Am29116 Output Enable Y-Bus 

57 

1 

lEN 

Am29116 Instruction Enable 

56 

1 

OLE 

Am29116 Data Latch Enable 

55-52 

4 

I 3 -I 0 

Am2910 Instruction 

51-42 

10 

Dg-D 0 

Am2910 Direct Input 

41-36 

6 

— 

Condition-Code MUX Selection & Polarity 

35 

1 

ADMC 

Address Mark Control 

34 

1 

BFCB 

(Enable Memory) Bus from (Disk Drive) Control Bus 

33 

1 

BFTP 

(Enable Memory) Bus from Translate PROM 

(Translate PROM needed in data compression operations) 

32 

1 

BF03 

(Enable Memory) Bus from 9403A FIFO array 
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Microinstruction Field 


Bits 

Width 

Mnemonic 

Comment 

31 

1 

BF16 

(Enable Memory) Bus from Am29ll6 Y-Bus 

30 

1 

BF2L 

(Enable Memory) Bus Lower Byte from Am9520 Q-Bus 

29 

1 

BF2U 

(Enable Memory) Bus Upper Byte from Am9520 Q-Bus 

28 

1 

BOUT 

Bus Direction OUT 

27 

1 

BT03 

(Enable Memory) Bus to 9403A FIFO Array 

26 

1 

BT16 

(Enable Memory) Bus to Am29116 Y-Bus 

25 

1 

BT2L 

(Enable Memory) Bus Lower Byte to Am9520 D-Bus 

24 

1 

BT2U 

(Enable Memory) Bus Upper Byte to Am9520 D-Bus 

23 

1 

BT20 

(Enable Memory) Bus to Am9520 REP, P3-P0, C2-C0 

22 

1 

CE2L 

Clock Enable Am9520 to Lower-Byte Bps Interface Register 

21 

1 

CE20 

Clock Enable Memory Bus to Am9520 Interface Registers 

20 

1 

CP20 

Clock Pulse for 9520 (Microcoded Waveform) 

19 

1 

CREQ 

Command Request 

18 

1 

INPT 

(Enable Serial Data) Input to 9403A FIFO Array 
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Micro instruction for the Very High Speed Controller (continued) 


Microinstruction Field 


Bits 

17-16 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 


2-0 


Width 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

3 


Mnemonic 

JMPI 

MADR 

MREA 

MWRT 

OUPT 

Fenb 

PFPM 

PFM 

PL03 

PREQ 

RDGA 

RFIF 

SAW 

W^ 

XLAT 


Comment 

(Enable) Jump Indirect Am29116 Y-Bus (double rail) 
(Enable Loading of Buffer) Memory Address Register 
(Enable Buffer) Memory Read 
(Enable) Memory Write Operation 

(Enable Serial Data) Output from 9403A FIFO array 
Parameter Enable 

(Enable Setting of Am9520) P Bits from Am9520 PM Bits 
(Enable) Parallel Fetch from 9403A FIFO array 
(Enable) Parallel Load of 9403A FIFO array 
Parameter Request 
Read Gate 

Reset 9403A FIFO array 
Select/Attention Strobe 
Write Gate 

Translate Table Select for Data Compression PROM 
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Intelligent Controller - Very High Speed (continued) 


Conclusion 


What makes this controller so fast? 


s The Am29116 microprocessor has been combined 
with the Am9520 burst error processor. 

- This provides the powerful Am29116 instruction set 
and the very effective hardware elements of the Am29116: 

. its CRC logic 
. 32 registers 
. barrel shifter 
. priority encoder 


- The Am9520 is a relevant specialized device which: 

. generates the checksum 

. checks the data together with the checksum 
and produces both the error pattern and error location 

. is very fast (due to its special hardware). 

You can calculate a CRC remainder - 

on an Am9520 at 50nsec per data byte . 
on an Am29116 at 200nsec per data bit . 

(The Am9520 is 32x faster than the Am29116!). 

® The Am9520 is unique in supporting the very fast 
Chinese Remainder Theorem method that greatly speeds 
the correction of a faulty sector. 

This method is fully supported by hardware for use while 
reading and in the correction calculation itself. 
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Intelligent Controller - Very High Speed (continued) 


Conclusion (continued) 

® We can write special microcode: for example for packing 
ASCII fields. 

This is not possible with a fixed-instruction-set processor 
or a specialized LSI disk-controller chip. 


® We are making use of the parallel hardware of the 9520. 

For example, the Am9520 can perform up to four simultaneous 
polynomial division operations and produce four independant 
syndromes concurrently. 


® We are concurrently reading from the disk and generating 
the polynomial remainders for error detection and correction. 

(We were not able to support checkbit calculation 
in parallel with writing, however). 


9 We have used a FIFO array to allow us to read a data stream 
that is faster than even an Am9520 can check it. 


§ We have incorporated a buffer memory that: 

- holds images of the last eight sectors read from or 
written to disk 

- holds 1/0 request queues to maximize throughput 

- holds additional house keeping tables 


@ We have used a PROM to translate from EBCDIC data 
to packed-BCD or ASCII 
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Application of Am29116 to General Purpose CPUs 
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A Microprogrammed CPU Using Am29116 


The following pages provide an introduction to 
AMD Application Note MPR-1712 
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A Microprogrammed CPU Using Am29116 

Introduction and System Overview 

e Am29116 is optimized for peripheral controller applications 

f However, Am29116 is also an ideal choice for CPU's as well. 

- it has a powerful instruction set for: 

. arithmetic operations 
. data movement 
. multiple-bit shifts 
. bit manipulation 
. status manipulation 

- it has high speed (100 nsec cycle time) 

- it can reduce power requirements 

- it save area on PC boards 

@ We will describe a CPU built with the Am29116 which maintains 
architectural and software compatibility with the Super-16.* 

s This CPU incorporates pipelining at 

- the microprogram level 

- the macroinstruction level 

* As described in detail in 

"Bitslice Microprocessor Design" by Mick and Brick, Chapter 9 







Central Processing Unit Block Diagram 
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A Microprogrammed CPU Using Am29116 (continued) 


System Organization 

This is a simple system comprised of: 

® 16-bits-wide main memory built from static RAM chips 
t Am29116 processor and CCU 

® A simple bus structure: 

- can be modified to accomodate interface signals 

- but to add other 1/0 devices, a bus controller is needed 
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System Organization (continued) 


Interface Signals between the Memory and the CPU 



• handshaking over three busses 

- 16-bit-wide address bus 

- 16-bit-wide bidirectional data bus 

7-bit-wide control bus 

• memory request 

• read/write 

• address accepted 

• data strobe 

• data synch 

• interrupt control lines 
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n-l»CYCtE n*»’CYCLE n + t* CYCLE 
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A Microprogrammed CPU Using Am29116 (continued) 


Memory Read (continued) 


§ To use the data in the n+1^*^ cycle, the Am29116 
generates the main memory address during the 
(n-1)^^ cycle. 


s Data is read during the n^^ cycle. 

9 The n^^ cycle must be stretched 
to accomodate main-memory READ timing. 

9 The signal to stretch the n^^ cycle is provided 
to the Am2925 clock generator during the (n-1)^*^ cycle. 
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A Microprogrammed CPU Using Am29116 (continued) 


Instruction Formats 
(same as for Super-16) 

i One-word instructions (16 bits): 


15 87430 


- register to register 

- register storage 

- storage to storage 


§ Two-word instructions 


RR 

RS 

SS 

(32 


OP 

R1 

R2 


OP 

R1 

X2 


OP 

XI 

X2 


bits): 


15 


0 15 


0 


register indexed storage RX 
register storage immediate RX 


OP R1 X2 

d 


OP R1 X2 

d 


@ 4-bit register address defines one of 16 registers: 

- uses lower half (R0 - R15) of the 32 registers in the Am29116 
as user registers 

- upper half (R16 - R31) used by the operating system 
(stack pointer, counter, etc.) 
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A Microprogrammed CPU Using Am29116 (continued) 


Instruction Formats (continued) 

§ 8-bit opcode specifies 256 instructions 

- includes information about the addressing mode 

- PUSH/POP operations 

- I/O instructions 

- decimal and binary integer arithmetic 

« Data types 

- bit 

- nibble 

- byte 


- word 
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A Microprogrammed CPU Using Am29116 (continued) 


CPU - Architecture 

i Internal data transfers use 16-bit wide internal bus 

i Data is transferred between the system bus and 
the internal CPU bus on three paths: 

- data register 

- address register 

- instruction lookahead register (Z-latch) 

@ Pipelining 

- microlevel: pipeline register at the output 

of the microprogram memory 

(for overlapped instruction fetch and execution) 

- macrolevel: instruction register (IR) and Z-latch 

. decodes the macroinstruction in the IR 

. next macroinstruction, displacement field 
or data in the Z-latch 
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A Microprogrammed CPU Using Am29116 (continued) 

Macroinstruction Data Path 

@ Macroinstruction can move from main memory to 

- the Z-latch or 

- immediately to the IR (by making the Z-latch transparent) 

i Load IR directly: 

- during pipeline-fill operation 

- on instruction after a two-word instruction 

s Decoding of (IR) determines the meaning of (Z); 

- next instruction if (IR) is a RR, RS or SS instruction 

- displacement if (IR) is a RX instruction: 

A displacement is moved into the Am29116 via its Y-bus 
in the cycle in which the operand address is formed. 

- immediate data (used in the execution cycle) 

e Am29116 can input and output data in the same microcycle: 

- data passes into the D-latch in 1^^ half of cycle 

- In the second half of the cycle, the D-latch is disabled 
& the ALU result goes out to internal bus via the Y-bus. 

® N-register can be used for: 

- N-way branching or for normalization (use prioritize instr'n) 

- "n" in the rotate-by-n instruction 
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A Microprogrammed CPU Using Am29116 (continued) 


Microprogram Control 

§ Am2910 sequencer generates address for next microinstruction 

- branch address sources: 

. pipeline register 

. interrupt vector decoder 
. macroinstruction decoder (mapping PROM) 

. N-register 

- Conditions for branching can come from: 

. Am29116 condition-test output 

. Am2904 condition-test output. Provides: 

- for extended set of branching conditions 

- for saving conditions generated previously 

(a useful result of having 2 status registers) 

- Am29116 T-bus (carry, overflow, negative, zero) 

- Interrupt request accepted by an 
Am2914 interrupt controller request 
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A Microprogrammed CPU Using Am29116 (continued) 

Timing Analyses (continued) 

Path Computations 

Path 1 (IR —> pipeline) 



From 

To 


Instruction register 

CP 

Q 

13 ns 

Mapping PROM 

ADD 

Y 

40 ns 

Sequencer 

D 

Y 

20 ns 

Micromemory 

ADD 

Y 

40 ns 

Pipeline register 

set up 


5 ns 




118 ns 

Path 2 (CC-flip flop 

--> CC-MUX control) 



From 

To 


CC flip flop 

CP 

Q 

13 ns 

Sequencer 

CC 

Y 

43 ns 

Micromemory 

ADD 

Y 

40 ns 

CC-MUX 

sel 

Y 

15 ns 

CC-flip-flop 

set up 


5 ns 


116 ns 
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A Microprogrammed CPU Using Am29116 (continued) 
Timing Analysis (continued) 

Path Computations (continued) 


Path 3 (pipeline register 

—> pipeline 

register) 


From To 


Pipeline register 

CP Y 

13 ns 

Sequencer 

I Y 

70 ns 

Micromemory 

ADD Y 

40 ns 

Pipeline register 

set up 

5 ns 



128 ns 


Path 4 (pipeline register —> CC-f1ip-flop) 



from 

to 


Pipeline register 

CP 

Y 

13 ns 

Three-state gate 

enable 

Y 

29 ns 

Am29116 (preliminary) 

I 

CT 

47 ns 

CC - MUX 

^in 

Y 

15 ns 

CC flip flop 

set up 


5 ns 




109 ns 
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A Microprogrammed CPU Using Am29116 (continued) 

Timing Analysis (continued) 

Path Computations (continued) 

Path 5 (pipeline register —> data register) 



From 

To 


Pipeline register 

CP 

Y 

13 ns 

Three-state gate 

enable 

Y 

29 ns 

Am29116 (preliminary) 

I 

Y 

88 ns 

Data register 

set up 


5 ns 




135 ns 

Path 6 (PC and STACK —> 

pipeline 

register) 



From 

To 


Am2910 (PC and STACK) 

CP 

Y 

100 ns* 

Micromemory 

ADD 

Y 

40 ns 

Pipeline register 

set up 


5 ns 




145 ns 


* It is assumed that the previous instruction could 
produce no change in the counter or could only 
decrement the counter 
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A Microprogrammed CPU Using Am29116 (continued) 


Timing Analysis (continued) 
@ Most critical path was 



From 

To 


Pipeline register 

CP 

Y 

13 ns 

Am29116 

I 

CT 

47 ns 

CC MUX 

D 

Y 

15 ns 

Sequencer 

a 

Y 

43 ns 

Micromemory 

ADD 

Y 

40 ns 

Pipeline register 

set up 


5 ns 


163 ns 


@ Introducing the CC-Flip-Flop separates the cycle time for 
that path into two non-critical paths (4 and 2). 

® Since the CC-FF delays the CC signal by one clock, the 
CC-MUX-select lines are driven directly from the microprogram 
memory rather than the pipeline register. This tactic 
re-aligns the selection of the condition code with the 
execution of the microinstruction. 
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A Microprogrammed CPU Using Am29116 (continued) 

Macroinstruction Execution 

9 4 basic sequences of operations: 

- Form Memory Address of Instruction: (1 microcycle) 

PC, MAR <— PC + 2 

- Fetch Instruction: (1 microcycle) 

generate Main-Memory-Request and Read-Strobe 
bus <-- ((MAR)) 

IR <-- (bus) at the next rising edge 

- Decode Instruction: (1 microcycle) 

IR <— Z-latch 

PROM generates starting address for the microprogram 

- Execute: 

Am29116 performs the specified operation on the operands. 

The number of microcycles depends upon the operation. 

® 2 extra steps are needed for instructions with memory operands 

- Form Operand Address: (1 microcycle) 

MAR <-- (X) + d using the 'd' in the Z-latch 

- Fetch Operand: (1 microcycle) 

Z <-- ((MAR)) or D <— ((MAR)) 


ADVANCED MICRO DEVICES ^ 




6-190 


ED29116 


6-190 


A Microprogrammed CPU Using Am29116 (continued) 


Pipelining at the Macrolevel 


i There is very little scope in this configuration 
for additional macrolevel pipelining: 

- You could use a second Am29116 as the basis for a 
Program Control Unit. 

That is, you would use one Am29116 as an ALU 
and the second Am29116 as a PCU. 

- You could use some other processor as a basis for 
this PCU. 

8 A PCU would increase throughput by introducing parallelism 
in the advancing of the program counter, and in stack 
operations. 

9 The present configuration already incorporates some 
concurrency, however. 
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A Microprogrammed CPU Using Ani29116 (continued) 
Pipelining at the Macrolevel (continued) 
Overlapping of Register to Register Instructions 



Form instruction 
Address 

A 

B 

1 

1 

1 

0 

1 

■ 

D 

1 

1 

■ 

■ 

■ 

■ 

1 

■ 

■ 

■ 

■ 

1 

PC + 2 ^ PC 

PC + 2 MAR 

mn 


fR 

A 

■ 

1 

■ 

1 

1 

■ 




■ 

Z 

E 



1 



■ 

■ 

1 

PC + 2 MAR and 

Load Z Latch or IR 

Decode 



■ 

1 

■ 

IR 

B 

1 

■ 

IR 

C 

1 


IR 

D 




1 



■ 

■ 


Decode Instruction and 
Load Pipeline Register 

Form Operand 
Address 



1 

1 

■ 

1 

1 

■ 


1 


■ 




1 



■ 

■ 


Z -f Index 

Register MAR 























Load Data Register 

Execute 




A 

A 


B 

B 


C 

C 


D 

D 










Y 

Y 


Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 








The Am29116 Usage 


A, B, C, D are Register to Register type instructions, 
Z = Z Latch 

IR = Instruction Register 


9 Cycle 2: pipefill operation (directly load IR) from memory 
while Am29116 forms next instruction address 

® Cycle 4,5: in single-port Am29116, RR instr. needs 2 cycles 

• Cycle 6: IR <-- (Z) and map into microaddress with a PROM 
while Am29116 forms next instruction address 

8 Cycle 3 is the only cycle in which the Am29116 is idle. 

i After the first 5 cycles, every third cycle produces a result. 
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A Microprogrammed CPU Using Ani29116 (continued) 
Pipelining at the Macrolevel (continued) 
Overlapping of Register to Index Storage Instructions 


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 

Form Instruction 
Address 

A 

Ad 



B 


Bd 



C 


Cd 










PC + 2-* MAR 

PC + 2 PC. 

Fetch 

Instruction 


IR 

A 

Z 

Ad 



IR 

0 


Z 

Bd 



IR 

C 


Z 

Cd 









(PC + 2 — MAR 
and PC)* 

Load IR or Z Latch 

Decode 



A 




B 





C 










PC + 2 MAR and PC 
Decode and Load 
Pipeline Register 

Form Operand 
Address 




A 





B 





c 








Z + Index 

Register ^ MAR 

Fetch 

Operand 





A 





B 





c 







PC + 2 -> PC and MAR 
Load Operand in 

Data Register 

Execute 






A 





B 





c 








Y 

Y 


Y 

Y 

Y 

Y 


Y 

Y 

Y 

Y 


Y 

Y 

Y 






The Am29116 Usage 


*For pipefill operation only. A, B C are Register to Index storage type instructions. 

Aq, Bq, Cd are displacement. 

Z = Z Latch 

IR = Instruction Register 


0 Cycle 2; Form Aq creates address to fetch displacement, 'd'. 

® Cycle 3; Decoding determines if Z contains a displacement. 

@ Cycle 6: Two-word instructions use both Z-latch and IR: 

- Instruction is directly loaded into IR. 

- Execution of A prevents formation of address of displacement, 'd'. 
(a PCU would save a cycle here). 

0 Every fifth cycle the Am29116 is idle. 

® Every fifth cycle produces a result 
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A Microprogrammed CPU Using Am29116 (continued) 
Pipelining of the Macrolevel (continued) 
Overlapping of Branch-on-Condition RX Type Instructions 


OP 

M 

X2 

d 


M is the condition 
(X2)+d is the address 


1 1 2 3 4 5 S 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 

Form Instruction 
Address 

A 




B or 

K 

Bo 

Kd 



B+2 

K+2 


B+2d 

K+Zd 











PC + 2 ^ MAR 

PC + 2 -> PC 

Fetch 

Instruction 


tR 

A 

Z 



IR 

B or 

K 

z 

Bd 

Kd 



IR 

B+2 

K+2 


Z 

B+Zd 

K+2d 










(PC + 2 -> MAR 
and PC)* 

Load IR or Z Latch 

Decode 



A 




B 

K 




B+2 

K+2 











PC + 2 -> MAR and PC 
Decode and Load 
Pipeline Register 

Form Operand 
Address 








B 

K 





B+2 

K+2 









Z + Index 

Register -> MAR 

Fetch 

Operand 









B 

K 





B+2 

K+2 








PC + 2 -» PC and MAR 
Load Operand In 

Data Register 

Execute 




A* 






B 

K 





B+2 

or 

K+2 









Y 

Y 


Y 

Y 

Y 


Y 

Y 

Y 

Y 


Y 

Y 

Y 







The Am2gi16 Usage 


•During this cycle decision to branch takes place 
If condition is true, Address = K = Index Reg + Aq 
If condition is false, Address = B = A+1 

Z = Z Latch 

IR = Instruction Register 


® Cycle 5: the result of the execution in cycle 4 held by the Am2904 determines 
whether the Am29116 will issue the inline or the branch address 


ADVANCED MICRO DEVICES O 





6-230 


ED29116 


6-230 


A Microprogrammed CPU Using Am29116 (continued) 

Microword Format 

§ 78-bit wide microword ® Control bits for each functional unit are grouped 


Field Width 

Mnemonic 

Description 

ALU 



16 

^0-^5 

29116 Instruction 

1 

DLE 

29116 Data Latch Enable 

1 

lEN 

29116 Instruction Enable 

1 

OEY 

29116 Output Enable Y-bus 

1 

SRE 

29116 Status Register Enable 

1 

OET 

29116 Output Enable T-bus 

3 

RAMSRC 

29116 l 0 -l 4 Source Select 

2 

NSRC 

29116 I 9 -I 12 Source Select 

Data Path 



4 

DSEL 

Data Register Source/Destination Select 

1 

DLD 

Data Register Enable 

1 

MARLD 

Memory Address Enable 

1 

IRLD 

Instruction Register Enable 

1 

ZLD 

Z-Latch Enable 

1 

NLD 

N-Register Enable 

1 

MAP 

Mapping PROM Output Enable 

1 

VECT 

Interrupt Vector PROM Output Enable 
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A Microprogrammed CPU Using Ain29116 (continued) 
Microword Format (continued) 


Field Width 

Mnemonic 

Description 

Memory Control 



1 

R/W 

Memory READ/WRITE Pulse 

1 

WREO 

Wait Request 

1 

DATASTB 

Data Strope 

1 

MEMREQ 

Memory Request 

Interrupt Control 



4 


2914 Instruction 

1 

INTO 

2914 Interrupt Disable 

Clock Select 



3 

^1-^3 

2925 Clock Length Select j 

Status 


1 

1 

EZ 

2904 Enable Zero j 

j 

1 

EC 

2904 Enable Carry | 

1 

ES 

2904 Enable Sign | 

1 

EOVR 

2904 Enable Overflow I 

1 

1 

OEM 

1 

2904 Enable Machine Status \ 

1 

OEMICRO 

i 

2904 Enable Micro Status | 
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A Microprogrammed CPU Using Ani29116 (continued) 
Microword Format (continued) 


Field Width 

Mnemonic 

Description 

Test 

4 

T 1 -T 4 

29116 or 2904 Test Status Instruction 

CCSEL 

3 

CCSEL 

Condition Code MUX Select 

Sequence Control 

4 

^0"l3 

2910 Instruction 

Branch Address 

12 

BA 

Next Micro Address 

78 
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A Microprogrammed CPU Using Am29116 (continued) 


Conclusion 

® Microprogrammability makes for easy and quick design 
of a customized architecture. 

® The powerful instruction set of the Am29116 is very 
suitable for CPU applications: 

- bit mampulation 

- multiple bit rotate 

- rotate and merge 

- rotate and compare 

- prioritize functions 

® We have shown a minimal configuration: 

- a PCU unit based on another Am29116, Am2901's 

or Am2930 PCU slices could increse the throughput. 
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A Microprogrammed CPU Using Am29116 (continued) 


Comparison with Super-16 
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A Microprogrammed CPU Using Am29116 (continued) 


The Super-16 


A 16-bit computer built from Am2900 Family parts to illustrate design techniques, 
Incorporates pipelining at the macro- and microprogram level. 



MM 
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A Microprogrammed CPU Using Am29116 (continued) 


Super-16 (continued) 

Processor Organization 

i Distinct sections 

- program control unit (PCD) 

- arithmetic and logic unit (ALU) 

- computer control unit (CCU) 

- data paths 

- memory control, clock control 

- I/O interface 

- interrupt section 

There is one important distinguishing feature as compared 
with the Am29116 CPU described in the last section; 

• The ALU and the PCU are separate. 

We will restrict our attention to this point. 
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A Microprogrammed CPU Using Am29116 (continued) 

Comparison with the Super-Sixteen 

The PCU: 

« Incorporates: 

- MAR (latch) 

- 4 Am2901 microprocessor slices 

- a 16-bit transfer register 

- a 16-bit bidirectional buffer ... called a transfer driver 

§ Controls the macroprogram sequencing 

i Is controlled by the microprogram 

§ Register assignments 
R0- program counter PC 
Rl- stack pointer 
R2- stack lower limit 
R3- stack upper limit 
R4- +2 ... a useful constant 

R5- +4 ... a useful constant 

R6,R7- not used but are available 
R8,R15- not used (wired to be disabled) 
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A Microprogrammed CPU Using Am29116 (continued) 

Comparison with the Super-Sixteen (continued) 


PCU (continued) 

§ Function 

- updating of the PCU 

- MAR <— (PC) for reading instructions or data from main memory 

- updating of the stack pointer 
checking stack limits 

® Communication 

- data to PCU from ALU via transfer register 

- data from PCU to Y-bus of the ALU via PCU 
transfer drivers 
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A Microprogrammed CPU Using Am29116 (continued) 

Comparison with the Super-Sixteen (continued) 


ALU 


« Designed from 

- 4 Am2903 superslices 

- Am2904 status and shift control logic 

- PSW register 

- 3 buses 

DA: ALU input from register (memory data) 
IMMD gate (immediate field of microcode) 

DB: ALU input from PSW 
Y: ALU output, input to RAM registers 

s RAM register selection from instruction register (I) 

- I 0 _ 3 : A address on Am2903 

- 14 , 7 : B address on Am2903 or 

control for CCMUX (in Am2904) in conditional 
branch instructions (macrolevel!) 
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A Microprogrammed CPU Using Am29116 (continued) 

Comparison with the Super-Sixteen (continued) 


ALU (continued) 

i Byte-wide operations are possible as well as word-wide: 

- only the lower 8 bits are affected 

- by disabling write-enable & output-enable for slices 3,4 

- word/byte MUX selects C,N,0VR from slice 2 (MSS) 

- force to zero for Z-signal 

@ Control for the ALU 

- A & B addresses, Am2904 CCMUX from macrolevel 17,0 

- ALU operations from microlevel Myg.gg 

- WORD signal Mg 0 (byte/word) 
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A Microprogrammed CPU Using Am29116 (continued) 

Comparison with the Super-Sixteen (continued) 

(Macro) Instruction Execution 

§ Form instruction address (PCU) 

- publish bus request at the beginning of the cycle 

- PC, MAR <— (PC) + 2 

- address bus <-- (MAR) 50ns prior to beginning of next cycle 

® Instruction fetch (main memory) 

- the main memory is fast enough to all reading in one cycle 

@ Decode (mapping PROM) 

- fetched instruction is routed through Z and Z1 registers 
to the instruction decoder (mapping PROM) 

- 8-bit opcode is address for the PROM giving the starting 
address for the microprogram to execute this instruction 

® Displacement fetch (main memory) 

- every instruction fetch is followed by another read cycle: 

, next instruction fetch for one word instructions 

. displacement fetch for two word instructions 

- decoding of the previous instruction determines how 
this data should be interpreted. 
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A Microprogrammed CPU Using Am29116 (continued) 

Comparison with the Super-Sixteen (continued) 

(Macro) Instruction Execution (continued) 

i Form operand address (ALU) 

- fetched displacement is sent through the Z and Z0 
register to the ALU (2903) 

- MAR <— (X 2 ) + d (50 ns before the end of the cycle) 

(See note 1) 

i Operand fetch (memory) 

9 Execute (ALU) (See note 2) 

- 2903's perform specified operation on the operand's 

- simultaneous with the last execution cycle 
the instruction decoder is enabled 


Notes: 1) Remember the instruction format for the CPU in 

the last section. This application is an emulation 
of the Super-16, so that the instruction formats 
match. 

2) Execution may take more than one microcycle. 
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Pipelining at the Macrolevel (continued) 


Register-to-Register Pipeline Operation 


Action 

A, B, C, D are RR instructions 

Form Instruction Address 

A 

B 

c 

D 










Fetch Instruction 


A 

B 

C 

D 









Decode 



A 

8 

C 

D 








Fetch Displacement 














Form Operand 

Address 








■ 

■ 

■ 

■ 



Fetch Operand 














Execute 




A 

B 

c 

D 






















1 

L 

3 

4- 

5 


7 








i Cycle 1,2: Compare with pipelining of the CPU in the last section 

@ Cycle 3: Address of a third instruction is formed 

{Super-16: Z,Z 0 ,Z 2 -registers; Other: Z-register) 

e Cycle 4: Simultaneous instruction execution and 

formation of instruction address 

Execution of A needs: 

- one microcycle for most instructions because 
of simultaneous decoding of B 

- more than one microcycle for a few instructions 
(e.g. I/O instructions). The pipeline stops. 


i After the first three cycles, every cycle produces a result. 
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Pipelining at the Macrolevel (continued) 
Register-to-Indexed-Storage Pipeline Operation 


Action 

A, B, C, D are RX instructions 

Form Instruction Address 

A 


B 



c 








Fetch Instruction 




B 



c 







Decode 



D 



B 



c 





Fetch Displacement 



A 



B 



c 





Form Operand Address 







B 



c 




Fetch Operand 





n 



B 



c 



Execute 






A 



B 



c 
















Cycle, 

n 

z. 



5 

4o 


8 

1 


in 

m 



s Cycle 2; The same actions as in the RR pipeline operation occurs 
(fetch instruction A, MAR <— PC + 2) 

@ Cycle 3: Form instruction address, memory read and decode 
are performed simultaneously 

The system doesn't "know" whether a displacement 
or the next instruction is fetched from memory. 

After the decode of instruction A it is determined 
to be a displacement 

® Cycle 4: Simultaneous form instruction address (PCU) and form 
operand address (ALU). This was impossible in the 
CPU of the last section. 


§ Cycle 5: Only one pipeline stage is active: memory read 
The operand is fetched 


Decoder is not enabled yet and therefore the 
upper part of the pipeline is stopped 
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Pipelining at the Macrolevel (continued) 


Register-to-Indexed-Storage Pipeline Operation (continued) 


s Cycle 6: Compare with Cycle 3 

Additional execute A cycle (ALU) 

In this cycle you see again the advantage of two 
separate devices for the PCU and the ALU. 

You can form instruction address and execute in 
the same microcycle. 


@ The pipeline needs 6 cycles from the beginning 
to produce the first result. 


i Every third cycle a result is produced. 
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Pipelining at the Macrolevel (continued) 
Branch on Condition RX Pipeline Operation 


Action 

A = RX Branch Instruction 

B = Next RX Instruction if branch is not taken 

K = next RX Instruction if branch is taken 

Form Instruction Address 

D 

■ 


K 

B+2 

K+2 

■ 

■ 

■ 






Fetch Instruction 

■ 

D 

■ 

B 

K 

B+2 

K+2 

■ 

■ 






Decode 



D 


■ 



■ 

■ 





Fetch Displacement 

■ 


D 


■ 

1 

8 

K 







Form Operand Address 

■ 


■ 


■ 

■ 




■ 




Fetch Operand 

■ 


■ 


■ 

■ 

■ 

■ 

B 

K 





Execute 

■ 


■ 

P 


A 3 




B 

K 





n 

L 



El 

U) 


3 







— 














Cycle 3: Form instruction-address (next instruction if branch is not taken) 
Fetch displacement of next instruction 
Decoding A detects a branch instruction 
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Pipelining at the Macrolevel (continued) 


Branch on Condition RX Pipeline Operation (continued) 


i Cycle 4: Execution of the branch instruction starts 

- Determination whether the condition is true 
or false is not yet possible. 

- Sequencing of the program is temporarily unknown 

To place the correct instruction in the Z-register 
when decoding of the next instruction (B or K) 
is enabled, form the instruction addresses of the 
two alternative instructions: 

- Form instruction address K (where K = (X 2 ) + d) 

- Instruction address B was formed in the last 
cycle and B can be fetched in this cycle. 


® Cycle 5: Fetch instruction K 

Executing A 2 determines whether the condition 
is true or false 

The cycle is long enough to form the correct 
instruction address (K+2 or B+2) 


• Cycle 6 : The last execution cycle A 3 enables the decoder. 

Decode either K or B as determined in cycle 5. 


e You see: The generation of the two alternative instruction addresses 
prevents the pipeline from flushing out totally. 
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Super-16 Microword Format 



ENABLE TRANSFER REG, 
LOAD TRANSFER REG. 
I-REG EN CTR 
l-REG INC/DEC 
PCU TRANS CHIP DISABLE 
PCU TRANSFER REG. 

LOAD MEMORY ADDR. REG. 
LOAD D-REG 
LOAD 21 INTO I REG, 
ENABLE 20 -» DA 
ENABLE PSW 
SHIFT CNT Am2910 ADDR. 
BRANCH INSTR. EN 


ENTREG 

LDTREG 

ENCTR 

INC 

PCUCD 

PCU - Y 

LDMAR 

LDD 

2I -* I 

EN20 

PSW 

SHTCNTEN 

BRIEN 


Am2901 F — B/Q 

Am2901 

Am2901 

Am2901 

Am2901 

Am2901 

Am2901 

Am2901 

Am2901 

Am2901 

Am2901 



o s 

_ Q n 

O? 3 3 
3 O 

£-5 


MICRO CONTROL WORD BIT DEFINITIONS 
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Super-16 Microword Format (continued) 




X 






X 




EN IMMEDIATE - DA BUS 

IMMD 

CD 

w 0 



ROM/IREGEN 

I/O CONTROL REG. EN 

ROM/I 

lOEN 

Vi 

& 

O) 

“ 0 

W 0 3 

w 0 * j* 



Am2914 INTERRUPTS DISABLE 

INTDIS 

U 1 

cn S. 



Am2914 ENI 0 -ENI 3 

INTRIEN 





Am2904 SHIFT EN 

SHFTEN 






CNTLB 7 






CNTLBs 





GENERAL 

CNTLB 5 

0 

0 



USE 

CNTLB 4 

(XI 

(£> 

. oj 0 



CONTROL 

CNTLB 3 

(X 

CO 




BITS 

CNTLBj 

CO 





CNTLB, 

CO 

cn 





CNTLBo 

CO 

Ul 






X 






X 




Am2904 OUT EN CONDITIONAL TEST 

OECT 

CO 

■&. 

CO 

CO 

- 



Am2904 EN ZERO 

EZ 




Am2904 EN CARRY 

EC 

CO 

ro 

(/> 

a 


Am2904 EN SIGN 

ES 

CO 

•3 " 
c 

0 


Am2904 EN OVERFLOW 

EOVR 

0 

w 

0 


Am2904 EN MACHINE STATUS 

CEM 

CD 


H 


Am2904 EN MICRO STATUS 

CEm 

CD 


a 

0 


Am2904 1,2 CARRY OUT CNTL 

'12 

•Nl 


r 

H 

Am2904 l„ CARRY OUT CNTL 

hi 

O) 


s 

0 

0 

0 






s 

Am2904 

Am2904 . 

TEST 5 

TEST 4 

M 

cn 

N3 


a 

H 

a 

c 

H 

m 

a 

Am2904 

Am2904 & Am25LS25l 

TEST 3 

TEST 2 

CO 

ro 

N5 

Test 

( 6 ) 

0 

m 

■n 

Am2904 & Am25LS251 

TEST, 



z 


Am2904 & Am25LS251 

TESTo 

0 


H 

0 





CO 

0 

5 


Am2910 I 3 

Am2910 I 2 

NAC 3 

NAC 2 

CD 

CD 


Am2910 I, 

NAC, 

•si 

“rS 



Am2910 Iq 

NACq 

Ol 

v> 




M,5 






M,4 






M,3 

(2 





M ,2 






M,, 






M 10 

M 9 

0 

CD 

> 

O' z 

S’ ? 




Ma 

CD 

- as — 




M, 

Ma 

M 5 

sj 

a> 

cn 

“3 ^ 

0 ° 
a. 




M 4 






M 3 

CO 





Mj 

fu 





M, 

- 





Mo 

0 
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Comparing Arn2901, Am29203, Am29116 
for General Purpose CPU's 







Comparison of Features as 

Operation 

General Purpose CPU's 

Am290lC 

Am29203 

Am29116 

16-bit ADD 

83 ns (max) 

120 ns (max) 

100-120 ns 

(max) 

16-bit MULT 

1600 ns (max) 

1800 ns (max) 

9000 ns 

(max) 




300 ns 

(max with Am29516/7) 

BCD ADD 

1000 ns (max) 

120 ns (max) 

1000 ns 

(max) 

Data Word Width 

4-256 Bits 

4-256 Bits 

8 or 16 

Bits 

# Registers 

16 only 

16,32,48,64 

32 only 


8-Bit Rotate 

800 ns (max) 

1000 ns (max) 

100-120 

ns (max) 

8-Bit Rotate & Merge 

1000 ns (max) 

1300 ns (max) 

100-120 

ns (max) 

Devices for 

4 X Am290lC 

4 X Am29203 

1 X Am29116 

16-bit ALU 

1 X Am2902 

1 X Am2902 




1 X Am2904 

1 X Am2904 




3 X SSI 




Typical power 

5.2 W 

5.6 W 

2.2 W 


for 16-bit ALU 






15 


6-435 ED29116 6-435 
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Performance Analysis 


Processor Type 
8048 


Instruction Type 
Bit Test Bit Set 


5,000ns 


2,500ns 


Am9080 


4,250ns 


1,750ns 


AmZ8000 


1,000ns 


1,000ns 


Am2901B (4) 
Am2904 
Am2902A 
25LS2538 (2) 


100ns 


100ns 


Number of 
Instructions 


Am2901B (4) 

Am2904 

Am2902A 

25LS2538 (2) 

25S10 (8) 

1 

100ns 

1 

100ns 

Am29116 1 


1 

! 

i 

i 

i 

100ns 

100ns 


ixecution 

Time 
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Performance Analysis (continued) 


Instruction Type 


Processor Type 


Rotate by N Rotate and Merge 


Am9080 


212,500ns I 252,500ns 


113,000ns 138,000ns 


AmZ8000 


11,250ns 


15,250ns 


Am2901B (4) 
Am2904 
Am2902A 
25LS2538 (2) 


Am2901B (4) 
Am2904 
Am2902A 
25LS2538 (2) 
25S10 (8) 


Am29116 


1,025ns 


1,325ns 


Number of 
Instruction' 



ixecution 

Time 
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Performance Analysis (continued) 


Processor Type 


Instruction Type 

16-bit ADD 16-bit Multiply 



Key 


Number of 


Execution 

Time 
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Performance Analysis (continued) 

Are you familiar with the Am25S10? 

How does it help speed up shifts and rotates on Am29iZil systems? 
It is a four-bit high-speed shifter: 

- It shifts four bits of data 0,1,2 or 3 places. 

- Several devices can be connected to: 

. perform shifts of 0,1,2 or 3 places 
on words of any length. (# of 25S10's equals # of bits/4) 


. perform a complete end-around barrel shift. 
(# of 25S10's needed equals # of bits/2) 
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CHAPTER 7 


Exercises - Part 2 
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Exercises - Part 2 


1. You have 8 ASCII characters that normally occupy 64 bits of 
memory. Since each byte has in fact only 7 active bits, it 
may be useful to pack these bytes into only 56 bits for 
storage on a disk. In writing these bytes to a disk you wish 
to pack them into a 56-bit contiguous frame by discarding the 
parity bit from each byte (i.e. the bit in the most 
signifigant position). Write the microinstructions for the 
Am29116 for packing these 64 bits into 56 bits. Assume that 
the characters are initially in R0 thru R3 and are to be 
packed into R4 thru R7. 

2. Write the code to perform a 16 x 16-bit unsigned integer 
multiplication. Write your code on the assumption that the 
operands are already in the RAM or on the assumption that the 
operands are to be supplied by your code as immediate values. 


3. If you have time, try the following exercise: 

The Am29116 can be very effective in arbitrating requests for 
service from several different sources. Suppose there are 
eight sources of service requests and the Am29116 has already 
serviced all requests from sources S7,S6 and S5. Write the 
code to cause the Am29116 to branch to the service routine 
associated with the highest priority source of the group 
S4,S3,S2,S1,S0 while ignoring S7,S6 and S5. 
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Solutions for Exercises - Part 2 


Problem #1: Packing ASCII characters 


Example 


Even parity 

0100 0100 1100 0011 0100 0010 0100 0001 


SOR, B, MOVE, SORA, R0 
ROTM, W, 15, MRAI, R0 
IMME H#3F80 


;0000 0000 0100 0001 —> ACC 


U: 0100 0010 0100 0001 

rot U: 1010 0001 0010 0000 

R: 0000 0000 0100 0001 

mask: 0011 1111 1000 0000 

ACC: 0010 0001 0100 0001 


B without A without 
parity parity 

ROTM, W, 14, MRAI, Rl 
IMME H#C000 

U: 0100 0100 1100 0011 

rot U: 1101 0001 0011 0000 

R: 0010 0001 0100 0001 

mask: 1100 0000 0000 0000 


ACC: 1110 0001 0100 0001 


two last bits of C 
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Solutions for Exercises - Part 2 (continued) 


Microinstructions for Packing ASCII Characters 


SOR 

B, MOVE, SORA, R0 



ROTM 

W, 15, MRAI, R0 

IMME 

H # 3F80 

ROTM 

W, 14, MRAI, R1 

IMME 

H # C000 

SOR 

W, MOVE, SOAR, R4 



ROTRl 

W, 14, RTRA, R1 



ROTM 

W, 13, MRAI, R1 

IMME 

H # 0FE0 

ROTM 

W, 12, MRAI, R2 

IMME 

H # F000 

SOR 

W, MOVE, SOAR, R5 



ROTRl 

W, 12, RTRA, R2 



ROTM 

W, 11, MRAI, R2 

IMME 

H # 03F8 

ROTM 

W, 10, MRAI, R3 

IMME 

H # FC0 

SOR 

W, MOVE, SOAR, R6 



ROTRl 

W, 10, RTRA, R3 



ROTM 

W, 9, RAI, R3 

IMME 

H # 00FE 

SOR 

B, MOVE, SOAR, R7 
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Solutions for Exercises - Part 2 (continued) 

Problem #2: Unsigned Integer Multiplication 
Some observations... 

0 Result is a 32-bit value. 

The Am29116 does not provide a double length shift directly. 

0 Remember that the Am29116 has a single port RAM architecture. 

- try to minimize operations which need two register operands 

i.e. - shift the result, not the multiplicand 
(only the result needs two registers) 

- use the ACC or D-latch for the multiplicand 

- in the double-precision addition use zero as 
as an immediate value 
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Exercise Solutions (continued) 

Unsigned Multiplication (16 x 16 bit) 


Symbolic 

Address 

Branch 

Address 

Sequencer 

CC 

Am29116 

Instruction 

SRE 

Comment 



cont 


SOR,MOVE,SOIR,R00 

X 

Multiplier --> R00 



cont 


Multiplier 

X 

Immediate value 



cont 


SONR,MOVE,SOI,NRA 

X 

' 

Multiplicand —> ACC 



cont 


Multiplicand 

X 

Immediate value 



cont 


SOR,MOVE,SOZR,R02 

X 

Initialize MSH of product to 0 


! 16 

PUSH 

PASS 

SOR,MOVE,SOZR,R03 

X 

Initialize LSH of product to 0 

Load counter & push loop address 

LOOP: 


cont 


SHFTR,SHRR,SHUPZ,R03 

0 

Shift LSH of product by one. 

Load LSB with zero. 



cont 


SHFTR,SHRR,SHUPL,R02 

X 

Shift MSH of product by one. 

Load LSB with QLINK. 



cont 


SHFTR,SHRR,SHUPZ,R00 

0 

Shift multiplier. 

Load LSB with zero. 


SKIP 

CJP 

CT(29116) 

TEST,TL 

X 

Branch to SKIP if MSB was zero. 



cont 


TORI,TORAR,ADD,R03 

0 

Double precision ADD 
to accumulate partial product ... 
LSH: R03+ACC —> R03 

1 


cont 


TOR1,TORIR,ADDC,R02 

X 

MSH: R02+0+C —> R02 



cont 


0 

X 

Immediate zero 

SKIP: 


RFCT 



X 

Loop back to "LOOP" 
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4^3 Exercise Solution 



DRIVE 1 REQUEST H 
DRIVE 2 REQUEST H 
DRIVE 3 REQUEST H 

DRIVE 4 - L 

DRIVE 5 REQUEST H 
DRIVE 6 REQUEST H 
DRIVE 7 REQUEST H 
DRIVE 8 — L 


Instruction executed; 

PRT1 B,PR1A, PRID, R3 & DIE 
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APPENDIX A 

A BRIEF REVIEW OF NUMBER THEORY 
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A Brief Review of Number Theory 


Number theory has application in relation to several products of Advanced 
Micro Devices, including the Am29500 Family of signal and array-processing 
devices and the Am9520/Am8065 Burst Error Processor. The theory also is 
important in cryptography and in random-number generation. What follows is a 
brief summary, without proofs, of selected number theory results and related 
notation. 

For proofs and much more material see: 

J.V. Uspensky and M.A. Heaslet, Elementary Number Theory 
(McGraw-Hill, New York 1939) 

H.J. Nussbaumer, Fast Fourier Transform and Convolution Algorithms 
(Springer-Verlag, New York 1982) 


If 'a' and 'b' are integers, with 'b' positive, the division of 'a' by 
'b' is defined by: 

a=bq+r , 0<r<b , 

where 'q' is called the quotient and 'r' is called the remainder. When 'r' is 
zero, 'b' and 'q' are factors or divisors of 'a‘. To indicate that 'b' is a 
factor of 'a' we write "bla", which we can read as "'b' divides 'a'". When 
'a' has no divisors other than 1 and 'a', 'a' is a prime. In all other cases, 
'a' is composite. 

All integers which yield the same remainder when divided by 'b' are said 
to be congruent modulo 'b'. To say that 'c' is congruent to 'd' modulo 'b' 
we write 


c e d modulo b. (but we will write = for = from here on) 
This condition will be true if b|(c-d). 
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Modulus arithmetic is similar to ordinary arithmetic: 

(aiia2) modulo b = ((ai modulo b) t {az modulo b)) modulo b 
(a 2 ^'a 2 ) modulo b = (aj^ modulo b)’(a 2 modulo b) modulo b 

However, division is a little more complicated. If 
na]^=na 2 modulo b 

we cannot always cancel the n's and conclude 
ai=a 2 modulo b. 

What is true is that 

aj^=a 2 modulo (b/d), 

where d is the greatest common divisor of 'n' and 'b' which we write as 
d=(n,b). 

Only if d=l can we cancel the n's. That is, we can only divide by 
numbers that are "relatively prime" to the modulus. 


It proves useful to define a function which designates the number of 
integers that are smaller than a given integer, 'm‘, and are relatively prime 
to 'm'. We call this Euler's totient function and write it as "^(m)". 

If 'm' is a prime then cp (m) = m-1. 

If 'm' is composite such that 

m = p^al.p2a2.p^a3. __ .p^as Pl,P 2 ,P 3 .Ps all prime, 

then 

Pl"l P2“l P3-I Pc-l 

^ (m) = m • - • _ 

Pi P 2 P 3 Ps 
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We now have enough notation to state Euler's Theorem ; 

If (a,m) = 1 then 

^ modulo m. 

Euler's Theorem allows us to perform division with respect to a modulus 
in a more general way than we discussed above. Suppose we want to solve for 
'x' given this linear congruence with one unknown: 

ax=c modulo m 

Using Euler's Theorem 

ax=c'a^^'’’^modulo m 

and provided (a,m)=l we may divide both sides by 'a' so that 
x=c*a*^^'"^"^ modulo m 

and we can see that is the reciprocal of 'a' with respect to modulus 

'm'. 

One of the uses of Euler's Theorem relates to its application to the 
Chinese Remainder Theorem. 
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The Chinese Remainder Theorem states that simultaneous linear congruences 
in one unknown can be solved: 

Let be k positive integers greater than 1 and relatively 
prime in pairs. The set of linear congruences x=r^- modulo m-j 
has a unique solution modulo M, where M=m]i"m 2 ’m 3 - ... "m|^. 


The calculation of x from the r^'s and m^'s is called the 
Chinese remainder reconstruction. It can be shown that 


X _ \ /_I ^ r2 + ... + /_ Y ^ '"k iTiodulo M. 

\Mi/ \M2/ UJ 

That is, 'x' is obtained from a linear combination of the remainders: 

X = A]^r;[^+A 2 r 2 + ... +A|^r|^ modulo M 

We can note that for x=l, all r^'s also equal 1. Hence we see that the 
coefficients, A^, also sum to 1: 

A 1 +A 2 + ... +A|^ = 1 modulo M 

which is a useful check on the correctness of a set of such Chinese remainder 
reconstruction coefficients which we may have calculated. 


It will often be easier to calculate the coefficients of the r.'s by 
knowing that 


M 


modulo 


m. 


m,- 




modul 


0 mi^ 


where the equivalent expression on the RHS above can be calculated on a 
machine with a shorter capacity for integers than is required for the LHS. 
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An Example of Chinese-Remainder Reconstruction 


Let us apply the above material to the problem of calculating the loca¬ 
tion of an error burst detected by an Am9520/Am8065 Burst Error Processor 
using its 32-bit polynomial. In this case, the location in bits will be given 
modulo 21 and 2047. These moduli have no common factors and hence are rela¬ 
tively prime. We may expect from the Chinese Remainder Theorem that we can 
calculate the location modulo M=21*2047=42,987. 


We will need the totients for the two factors: 

21 is composite, thus ^(21) = ^(3’7) = 21 • 3-1 * 7-1 = 12 

' ' 3 7 

2047 is also composite, thus ®(2047) (23*89) = 2047 * 23-1 * 89-1 = 1936 

' 23 89 
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To understand how to proceed to 
reconstruction coefficients, let us begin 


calculate the Chinese remainder 
with the second coefficient: 





But = 



1936 


211936 42,987. 


65547269477777817757200969720480807996746850662145197381218624584202721941899507 

21423897202387805933094706460527736304494713254745339081373600441413238843120193 

32585827514379333859761233117712533509551459443911208580632578963327992769384375 

42603036057970228687681011232749791880072864832561980276282272591617574260963299 

10793538286544760653100399786955431363161625429428147166799811460139502247280241 

48421990224184032422659001952950125880854620435176178676536304017189886727728577 

66478762808632533720316541630429665323923628823770688465926358082614916364897819 

95299608995721341133829535195688330579297981894772229974859050987993820371499794 

81395954754540598720417165654121078033944840123118486698356560103988760885806716 

49401708128661849049145563730469483530514180432470512369426690573967032574682856 

19505758551358401503702558533907289929937210320299658444316439393747438560146636 

68039649829256742652205700088710439478948136143462730131378420168689745030305135 

30381506509830279486809578859037463252657776488671886012947491670732595286632623 

51238881264892457976606743296108336267285279283503060928469489247666663467542882 

42872774392908778211927604044280555600130652888690727539976402131080690579575162 

33026218677122044778041620387985106584998777559109963128759282081825133237066491 

14360200478860036199638263326158639613593793313283634473525862578760726807117983 

83504786413185684889586777243776418266864603084497027156863545962997632912488928 

24382666238233440270657440539167330004997768395266387139798524809164723581467953 

70122867501116649732338446947707912921938208517168298067784043392255318744545510 

48987827341190093854105444095401537664898756640898301015058812731231352290137586 

72561948104908089710618342205542107240251659744640805787879248125748563346533914 

49216499047922658518221629883204672382218062598860097198121309519710514803739640 

47732062609858999542190657569218137188675174633600219492324116932670268519947893 

95154782422791252339103131937353661659594851341730694387344197857885617918745707 

51567134183215224267935955780888418997411766677983604449691049545327112018360127 

56862721010965881665681274008574228137556218007157472347276882310366231153287428 

90393414227090838945939699975151036281500004075438705477189761562777940953711229 

20344720978570249941195753494856222496335355113341500070914900206483837650114413 

19486308794026419003550913064591870027487527693765073802654682863900464126535037 

46847030398846937549187409894959831291292882913809459139940065827331105367368931 

14522615452826162207116348274817955572941952657616714150555124862263651922390721 


No kidding! ... 2,560 digits. 
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It is inconvenient to deal with such large integers. Thus it is clear 
that doing the multiplications to get the 1936th power then reducing this 
large result modulo 42,987 is a difficult route to get A 2 . Since the power is 
a relatively small number, we could alternately do one multiplication, reduce 
this partial result mpdulo 42,987 and then repeat this process until we have 
reached the 1936th power. By this route we will never have to produce a 
partial result that exceeds (42,987-l)x21=902,706. This 6-digit integer is 
within the range of a pocket calculator. However, multiplying 1936-1=1935 
times and reducing the result moduto 42,987 each time is still more tedious 
than necessary. 

At this point it is useful to note that we can reach large powers rapidly 
by repeated squaring. That is, to raise a number 'a' to the power 2^ we just 
square 'a' 'b' times; 


,16 . ((( a2 ) 2 ) 2)2 


That is, we can get the 16th power by 4 multiplications rather than by 16-1=15 
multiplications. 


10 9876543210 
1936 in binary is 11110010000 


Thus al936 





which we can calculate by only (10+9-s-8+7+4)+4=42 multiplications and , by far 
fewer multiplications if we can use the calculation of the smaller powers as a 
start towards the larger powers. 
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All of this can be consolidated in a short BASIC language program that 
calculates integer powers by converting the power to binary and accumulating 
the power by the squaring method while building the larger terms from the 
smaller; 


2000 'Integer Power Subroutine 
2020 'Calculates A=BASE''PWR mod M 
2040 'Alters BASE & PWR 
2060 ' 

2080 A=r initial partial result 

2100 IF PWR MOD 2 = 1 THEN A = (A*BASE) MOD M 

' update partial result if next bit of PWR is one 
2120 PWR=INT(PWR/2)' right shift pwr 

2140 IF PWR=0 THEN RETURN' done if all bits of PWR have been used 
2160 BASE=(BASE*BASE) MOD M' square again 
2180 GOTO 2100 

By the use of this program or other means we obtain 

A2 = 4095 ** 


And similarly we get Aj^ 



modulo 42,987 


= 2047^^ modulo 42,987 


= 38,893 ** 

Note that A 2 +A 2 = 42,988 = 1 modulo 42,987 (as is expected for valid 
coefficients). Of course, if we were absolutely sure of the value for A 2 , we 
could have obtained A^ from 42,988-A2. 

Thus, if the error location was known to be 16 mod 21 and 1128 mod 2047 
then the location is 


L = (38,893*16+4095*1128) modulo 42,987 

= 40,021 (Checking: 40,021 = 1,905*21+]^ 

& 40,021 = 19*2047+1128 ) 

** Note: Am9520/Am8065 data sheets show A^ and A 2 as above but 
multiplied by 8 for convenience in using byte counts 
rather than bit counts in calculating error locations. 
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